Standard Deviation ... some thoughts

suggested by John B.
When doing the calculations necessary to generate Bollinger Bands one normally ...
>Bollinger?
Don't you remember? From the above link:


 Collect stock prices for the past N days: P_{1}, P_{2}, ... P_{N}
 Calculate M, their Mean (or Average) via:
M
= (1/N) (P_{1} + P_{2} + ... + P_{N}) = (1/N)ΣP_{m}
where Σ means Sum the terms and P_{1} is the Price N days ago and P_{N} the most recent Price
 Calculate SD, the Standard Deviation of this set of Prices via:
Variance = SD^{2}
= (1/N)(P_{1}^{2} + P_{2}^{2} + ... + P_{N}^{2})  M^{2}(N)
= (1/N)ΣP_{m}^{2}
 [(1/N)ΣP_{m}]^{2}
(the average square) minus (the square of the average) ... which is the same as (1/N)Σ(P_{m}  M)^{2}
 Pick a small number k (example k = 1 or maybe 1.5 or maybe 2) and calculate:
the Upper Bollinger Band via: BU = M + k SD
the Lower Bollinger Band via: BL = M  k SD
Here's an example: GE stock, from November, 2000 to July, 2003 and ...
>So? N = ? k = ?
Patience! In Figure 1, I used N = 20 days and k = 2.0 Standard Deviations.
>And the stock price bounces off the Bollinger Bands, eh?
That's the idea. To see the geometry more clearly (in this example), we can blow up a part of the chart:
Figure 2
 Figure 1

>Yeah, but isn't that unusual? I mean, the Standard Deviation of Stock Prices?
Yes, but it works pretty well, eh? It suggests when to BUY or SELL.
However, one normally calculates (and plays with) the Standard Deviation of Stock Returns, not Stock Prices, but ...
>Don't tell me! That's the purpose of this tutorial, right?
Right. To investigate the relationship between the SD of Stock Prices and the SD of Stock Returns.
For the GE example above, we get Figure 3.
In what follows. we'll be considering the daily stock Gain Factors rather than stock Returns.
By Gain Factors I mean g_{1} = P_{1}/P_{0} and g_{2} = P_{2}/P_{1} etc.
 Figure 3

Let's suppose that:
 The Stock Price, N+1 days ago, was P_{0}
 Let the daily Gain Factors over the past N days be g_{1}, g_{2}, ... g_{N}
 Let G_{m} be the cumulative Gain Factor over the first m days, so
G_{m} = g_{1}g_{2}...g_{m}.
 Then the Stock Prices are:
P_{1} = P_{0} g_{1} = P_{0} G_{1}
P_{2} = P_{0} g_{1}g_{2} = P_{0} G_{2}
P_{3} = P_{0} g_{1}g_{2}g_{3} = P_{0} G_{3}
...
P_{N} = P_{0} g_{1}g_{2}...g_{N} = P_{0} G_{N}
The question is:
If we know the Mean and SD of the
gs, what's the Mean and SD of g_{1} +
g_{1}g_{2} + g_{1}g_{2}g_{3} + ...
+g_{1}g_{2}g_{3}...g_{m} ?

>Huh? Why is that the question?
Because, whereas the set g_{1}, g_{2}, g_{3}, ... are the daily Stock Gain Factors,
the daily Stock Prices depend upon
g_{1}, g_{1}g_{2}, g_{1}g_{2}g_{3}, ... and we're interested in
determining the relationship between ...
>Yeah, I remember: the relationship between the g's and the G's.
Right, because the G's are products of the g's, like g_{1}g_{2}g_{3}...
We assume we know the parameters for the daily Gain Factors, g_{1}, g_{2}, g_{3}, ...assumed to be random variables:
Okay, now we have the following magic formula:
As noted here:
{
Mean(xy)  Mean(x) Mean(y)
} = SD(x)SD(y) R
where R is the Pearson correlation between the random variables x and y.
We'll assume that our Gain Factors g_{m} are independent, so R = 0 so that
Mean(g_{i}g_{j}) = Mean(g_{i}) Mean(g_{j})
... and that extends to a product with umpteen terms. That is, the Mean of a Product is the Product of the Means.
>The g's are independent? Can you assume that?
Why not? We could incorporate all the crosscorrelations but daytoday gains (usually) have small correlation.
For example, the Pearson correlation between successive daily gains, for the GE example, above, is about 1.6% so it's seems
reasonable (and greatly simplifies the math) ...
>Yeah, yeah. Sounds like a math gimmick to me.
A math gimmick, eh? Yes, analysts always often use such gimmicks, like assuming a Lognormal distribution. It makes life easier.
Anyway, if the g's are uncorrelated, we say that
Mean(g_{1}g_{2}... g_{m}) = Mean(g_{1}) Mean(g_{2})...Mean(g_{m})
(where the g's are random numbers selected from some distribution).
Then:
Mean(g_{1}g_{2}... g_{m}) = g^{m}.
Hence, the Mean of N successive Gain Factors = (1/N)[g_{1} + g_{1}g_{2} + g_{1}g_{2}g_{3} + ... + g_{1}g_{2}g_{3}...g_{N}]
= (1/N)[g + g^{2} + g^{3} + ... + g^{N}]
and we have a formula for the sum of that series:
Mean of N successive Gain Factors = Mean(g_{1} + g_{1}g_{2} + g_{1}g_{2}g_{3} + ... + g_{1}g_{2}g_{3}...g_{N} )
= (1/N)[g(g^{N}  1)/(g  1)] 
Okay, now on to the Standard Deviation:
For independent random variables x and y, we have another magic formula:
SD^{2}(x*y)
= Mean^{2}(x)SD^{2}(y)
+ Mean^{2}(y)SD^{2}(x)
+ SD^{2}(x)SD^{2}(y)
We can write this as:
[*]
SD^{2}(g_{1}g_{2}...g_{m1}g_{m})
=
Mean^{2}(g_{1}g_{2}...g_{m1})SD^{2}(g_{m})
+
Mean^{2}(g_{m})SD^{2}(g_{1}g_{2}...g_{m1})
+
SD^{2}(g_{1}g_{2}...g_{m1})SD^{2}(g_{m})
or
[*]
SD^{2}(g_{1}g_{2}...g_{m1}g_{m})
= (g^{2}+s^{2})
SD^{2}(g_{1}g_{2}...g_{m1}) +
g^{2m2}s^{2}
>This looks awfully familiar.
Yes, we did it here, but we'll repeat it just to avoid going back and forth:
Using [*] and proceeding stepbystep, we get:
[!!]
Product   Mean^{2}  SD^{2} 
g_{1}   g^{2}  g^{2}+s^{2}  g^{2} = s^{2} 
g_{1}g_{2}   g^{4}  (g^{2}+s^{2})^{2}  g^{4} 
g_{1}g_{2}g_{3}   g^{6}  (g^{2}+s^{2})^{3}  g^{6} 
...   ...  ... 
g_{1}g_{2}g_{3}...g_{m}   g^{2m}  (g^{2}+s^{2})^{m}  g^{2m}

To get the Variance (or SD^{2}) for N cumulative Gain Factors, we could consider adding the variances for each (in the rightmost column, above) but ...
>I assume that the variance of a sum is the sum of the variances?
Only for independent random variables, but the terms g_{1}, g_{1}g_{2}, g_{1}g_{2}g_{3} etc. are NOT independent.
Finally, then, we have:
SD^{2}(g_{1} + g_{1}g_{2} + g_{1}g_{2}g_{3} + ... +g_{1}g_{2}g_{3}...g_{N})
=
Σ
(g^{2}+s^{2})^{j}
 Σg^{2j} (j running from j = 1 to j = N)
>And these g_{1}g_{2} and g_{1}g_{2}g_{3} ... they're are all independent?
Uh ... not really, but for starters we'll assume that they are.
>Don't you have an "altogether now"?
Note that, when multiplied by the starting Stock Price P_{0},
the cumulative Gain Factors G_{1} = g_{1}, G_{2} = g_{1}g_{2} etc. are
just the successive daily Stock Prices.
Altogether now:
[A]
If the daily stock Gain Factors, g_{1}, g_{2},
... are uncorrelated random variables with Mean(g) = g
and SD(g) = s
then: Mean of N successive
Stock Prices = (P_{0}/N)[g(g^{N}1)/(g1)] and
Variance of N successive Stock Prices = P_{0}^{2} [(g^{2}+s^{2}) {(g^{2}+s^{2})^{N}1} / (g^{2}+s^{2}1) 
g^{2} {g^{2N}1} /
(g^{2}1)]

>You've used a magic formula to add up Σ
(g^{2}+s^{2})^{j}
 Σg^{2j} ?
Yes.
>I figure there's a lot of handwaving there. Can you provide some real life ...?
Let's check out the efficacy of these magic formulas, okay?
>Efficacy?
Pay attention.
 We generate a set of 20 randomly selected daily returns
from a lognormal distribution with Mean = 1% and SD = 1%.
 Using the magic formula above,
namely (P_{0}/N)[g(g^{N}  1)/(g  1)]
where P_{0} = $10, N = 20 days and daily Gain Factor g = 1+DailyReturn = 1.01
we get a (formulagenerated) Mean Stock Price as:
Mean Stock Price (over 20 days) = (10/20)[1.01(1.01^{20}  1)/(1.01  1)] = $11.12
 Now we calculate the actual Mean of this set of 20 Stock Prices (starting at P_{0} = $10.00).
 Finally, we calculate the percentage error between the actual Mean and the formulagenerated Mean of $11.12
 Then we repeat all of the above steps umpteen times ... to see how good (or bad) the formula is.
 Figure 4

For a dozen sets of 20day Stock Prices, we get Figure 4.
>Yeah, so the formula gives a reasonable estimate for the Mean of Stock Prices, but what about GE and what about SD and ... ?
Patience, but I should point out that the formula for the SD is ... uh, lousy.
Look at Figure 5, for the GE example we started with.
It shows, in blue, the actual moving 20day Average Stock Price and,
in red, the Average according to the above formula. That is, we take the Mean daily Gain Factor for the past
20 days (that's g) then we use the formula to estimate the average stock Price, namely:
(P_{0}/N)[g(g^{N}  1)/(g  1)]
>Huh? That's g? What's g?
Yes. If the Stock Price 21 days ago was P_{0} = $39.40 and the average daily gain (over the past N = 20 days) is 0.234%,
then we use (for that 20day period)
g = 1+AverageGain = 1.00234 and
(P_{0}/N)[g(g^{N}  1)/(g  1)]
and plot the point (39.40/20)[1.00234(1.00234^{20}  1)/(1.00234 1)] = $40.38
... in red.
 Figure 5

Figure 6

>Mamma mia! Something smooth, like the 20day moving average, turns into ...
Something scary ... which explains why the SD = Volatility formula needs some work
>So what does it look like, for the GE example?
Can't you see Figure 6? There, I looked at the actual SD (for Prices) over the previous 20 days and compared it with the
formulagenerated SD (using, each day, the 20day moving average daily Gain Factor g ... and SD = SQRT[Variance]
from [A], above)
Although the Formula Mean stays close to the actual Mean (over 20 days, as in Figure 5), those wee oscillations are
pretty wild and they generate a wild and wooly volatility.
>But the volatility in Prices is pretty wild too.
True, so maybe trying to generate a formula which mimics the actual Price volatility is pointless.

>So is it always that way or maybe with some other stock or ...
Okay. Consider these:
Figure 7
>Aha! Look at the S&P 500! The formula gives a smooother SD!
Hmmm. Interesting, eh?
>And how do Bolli Bands work ... for the S&P?
Figure 8 shows the S&P and 20day, 2SD Bollinger Bands and ...
>Yeah, so why can't you conjur up a formula which generates actual Price volatilities?
Uh ... senility?
>Besides, I thought you wanted to compare two Volatilities: Gains and Prices.
Instead you're comparing your PriceVolatility formula with the actual PriceVolatility.
Oh yeah ... I forgot.
 Figure 8

Figure 9

Okay, let's assume the Standard Deviation of Gains is s and we stare intently at the formula for the Standard Deviation of
Prices (from [A], above), namely:
SD(Prices) = P_{0}SQRT{[(g^{2}+s^{2})
{(g^{2}+s^{2})^{N}1}
/ (g^{2}+s^{2}1)
 g^{2}
{g^{2N}1}
/ (g^{2}1)]}
If we pick a few daily Gain Factors (averaged over 20 days ... that's g) and pick P_{0} = $1.00 then see how
SD(Prices) varies with s... that's SD(Gains) ... we get Figure 9.
Remember, g is the average Gain Factor, namely 1+AverageReturn.
>It looks linear, eh?
Yes, for these particular parameters. We're talking about DAILY parameters
... and g is close to "1" and s is small.

>But it doesn't change much with the average daily gain. All the curves are ...
Close together? Yes, so let's analyze.
If we put g = 1+R, where R is the (small) 20day average daily return, then we can use the fact that,
for small values of x and y,
(1+x)^{m} = 1+mx (approximately) and (1+x)(1+y) = 1+x+y (approximately).
We use these in the magic formula, putting:
 g^{m} = (1+R)^{m} = 1+mR (approximately)
 g^{2}+s^{2} = 1+2R+s^{2} (approximately)
 (g^{2}+s^{2})^{N}
= (1+2R+s^{2})^{N} = 1+2NR+Ns^{2} (approximately).
The magic formula:
SD(Prices) = P_{0}SQRT[
(g^{2}+s^{2}) {
(g^{2}+s^{2})^{N}1} /
(g^{2}+s^{2}1) 
g^{2} {g^{2N}1} /
(g^{2}1)]
then becomes:
SD(Prices) = P_{0}SQRT[
(1+2R+s^{2})
(2NR+Ns^{2})/(2R+s^{2}) 
(1+2R)(2NR)/(2R)]
or SD(Prices)
= P_{0}SQRT[N(1+2R+s^{2})  N(1+2R)]
or
SD(Prices) = P_{0}SQRT[Ns^{2}]
or SD(Prices) = P_{0}SQRT[N]s =
P_{0}SQRT[N]SD(Gains)
so
SD(Prices) / SD(Gains) =
P_{0}SQRT[N]
Moral?
We might expect that the Standard Deviation of Prices is a multiple of the Standard Deviation of Gains,
the multiplier being proportional to the square root of N (the number of days, say N = 20) and the initial Price P_{0}
(that'd be 21 days ago).
>I thought we were talking 20 days, not 21.
Uh, you're right. It's 20 days ago, but we're considering a total of 21 days 'cause you need 21 prices to calculate 20 daily gains.
>Aah, I see. And the average daily gain g disappears.
It's not a daily gain, it's a daily gain factor, but yes, in this smallreturnapproximation it cancels out which explains why the various graphs in Figure 9
are so close together.
Figure 10

>So the volatility of Prices is proportional to the volatility of Gains. Do you believe that?
Well, it changes day to day because that starting price, P_{0}, changes.
>And that's it? You've finished?
Well, suppose we look at our formulagenerated Price Volatility and plot it against the actual Gain Volatility, except we'll "normalize" the
Price Volatility by dividing (at each daily calculation) by P_{0}SQRT[N] ... so the changing prices (as we move day to day) don't
influence the comparison. Then, for the GE example, we get Figure 10.
>I haven't the faintest idea of what Figure 10 is saying!
Then have a nap. I think I will ...

>But 98% correlation! Wow!
Don't get too excited. That's the correlation between our SD formula for Prices (divided by
P_{0}SQRT[N]) and the Standard Deviation of the actual Stock Gains, namely s ... and we've already shown that
they're (almost) proportional so we'd expect ...
>But what about the correlation between actual SD(Prices) and actual ...?
Like Figure 11?
>You're comparing formulas for Gain volatilities and Price
volatilities and actual Gain volatilities and Price volatilities. It's confusing!
Then have a nap.
>zzzZZZ
 Figure 11

Of course, y'all can play with a spreadsheet which looks like
this.
Just RIGHTclick and Save Target ... here.
See also this.
