Distributions of Returns

When somebuddy performs a mathematical ritual in order to generate (fictitious) returns from some known distribution of returns (for Monte Carlo simulations, for example), she (normally) does the following:
  1. Consider a chart which describes a known cumulative distribution, like Figure 1.
  2. Each point (X,Y) on the graph gives the probability that a return X is less than Y.
    (For example, in Figure 1, it says that there is a probability of 0.7, or 70%, that a return is less than 20%.)
  3. To generate a (random) collection of returns from the distribution described by Figure 1, she picks a random number, uniformly distributed between 0 and 1 (like 0.7).
  4. That's a randomly chosen Y-value, uniformly distributed. The corresponding X-value gives a (random) return, distributed according to Figure 1.
  5. Then she'd repeat this ritual a jillion times to get a jillion random returns (from the known distribution).
>Huh?
Using Excel, a Normal cumulative distribution with Mean 0.1 and Standard Deviation 0.2 can be obtained via:

      =NORMDIST(X, 0.1, 0.2, 1)


Figure 1

Note that NORMDIST gives a number between 0 and 1 (or, equivalently, between 0% and 100%).
>What's X and Y?
X is some return ... like 0.20, meaning 20%.
Y is the value generated by NORMDIST(0.20, 0.1, 0.2, 1) ... like 0.70 meaning 70%.
>So Y = NORMDIST(X, 0.1, 0.2, 1), right?
You got it!
>Do you have an example?
Yes. It's Figure 1. That's how I generated the chart ... using NORMDIST(X, 0.1, 0.2, 1) with X going from -0.50 to 0.60.
>So NORMDIST(0.2, 0.1, 0.2, 1) gives 0.7, eh?
Well, actually, it gives 0.691 (or 69.1%) ... but it looks like 70% on the chart and I wanted some simple number so I cheated a wee hit Now, if you wanted to know the probability that a random number (with the distribution as in Figure 1) lies between, say 0.15 and 0.25, you'd calculate:

      (probability that the number is less than 0.25) - (probability that the number is less than 0.15)

or       =NORMDIST(0.25, 0.1, 0.2, 1) - NORMDIST(0.15, 0.1, 0.2, 1)
Note that if the cumulative distribution has a large slope (as in Figure 1, with X near 0.10 or 10%), then small changes in X produce large changes in the probability.
The larger the slope at X, the greater the chance of a randomly chosen number lying in the neighbourhood of X.

If we plot the slope of the cumulative distribution, we'd get something like Figure 1A, where the bigger slopes are near the middle (where the Mean occurs).


Figure 1A
>Yeah, but you said, in step #3 above, that she'd choose the Y-value, not the X-value, and with NORMDIST you'd have to pick X.
Yes, you're quite right. I used NORMDIST to generate Figure 1, to illustrate a cumulative distribution and ...

>So how would I pick a Y-value and have Excel give me an X-value?
If you pick a Y then, to get X, you'd use:

      =NORMINV(Y, Mean, SD)

Note that NORMINV gives a return ... maybe between -0.50 and +0.60 (meaning -50% and 60%).

>That's confusing. I mean ...
Okay. We have (for example):

[1]            Y = NORMDIST(X, 0.1, 0.2, 1)

That's if we know the return X and we want the percentage Y.
However, if we know Y and we want X, we'd solve [1] for X. In Excel-speak, that'd be

[2]             X = NORMINV(Y, 0.1, 0.2)

>That's STILL confusing!
Okay, here's a simpler example:
If Y = X3 you'd get Y if you know X.
However, if you knew Y and wanted the corresponding X, you'd solve for X = Y1/3.
The two equations: Y = X3   and   X = Y1/3   are equivalent. They define two functions... one is the inverse of the other.
In our example, above, NORMDIST and NORMINV are inverse functions.


>So what's your point? Why this tutorial? Why ...?
You probably didn't pay attention to step #3, above. To get a collection of Xs corresponding to some known distribution (such as Figure 1), you'd pick a "uniformly distributed" random number Y between 0 and 1.
"Uniformly distributed" means the number Y is just as likely to be between 0.0 and 0.1 and it is to be between 0.1 and 0.2 or 0.2 and 0.3 etc.
That's what RAND() does, in Excel. If you generate 1,000,000 values of RAND() and determine how many lie in the 10 intervals (0.0, 0.1) and (0.2, 0.2) ... (0.9, 1.0) you should get 100,000 in each interval.
Remember [2]?

[2]             X = NORMINV(Y, 0.1, 0.2)

If Y = RAND(), then X will be a normally distributed return with Mean = 0.1 and SD = 0.2.
Note that a uniformly distributed Y would have a cumulative distribution like Figure 2A.
Note that the probability of being between Y and Y+dY is the same, regardless of the value of Y. That is, if the distribution is described by F(Y), then F(Y+dY) - F(Y) is the same for all Y. Remember: The larger the slope at Y, the greater the chance of a randomly chosen number lying in the neighbourhood of Y. In Figure 1, the slope is constant.

>I assume you're talking about Y and Y+dY both lying between 0 and 1, eh?
Oops. Sorry 'bout that. Yes, of course.

Okay, we've seen the distribution function for a "uniformly distributed" Y. That's Figure 2A.
What we want to do now is generate our random return using:

[3]             X = NORMINV(R, 0.1, 0.2)

where R will NOT be uniformaly distributed between 0 and 1 (as illustrated in Figure 2A) .
In fact, R will have a distribution like, maybe, Figure 2B.

Or maybe even something like Figure 2C, below  


Figure 2C

Figure 2A

Figure 2B
>You're kidding, right?
Not at all. We just need to invent some function, y = F(x), which increases from 0 to 1 as x increases from 0 to 1.
>Inside the red square?
Exactly.
For Figure 2A, we'll be choosing numbers evenly distributed in (0,1).
For Figure 2B, we'll tend to choose more of the larger numbers in (0,1)
For Figure 2C, we're more apt to to choose numbers near 0 or near 1.
>And how does that affect the random returns you generate?
Good question. Remember Figure 1?
If we select numbers between 0 and 1 (such as 0.7), and each selected number is equally probable (that's uniformly distributed 2A, eh?) then we'd get the normal distribution.

If, however, we choose more numbers near 1, we'd get more of the larger returns.
(That's Figure 2B.)

If, however, we choose more numbers near 0 and 1, we'd get more of the returns in the tails of the distribution.
(That's Figure 2C.)


Figure 1 (again)

Do you remember this chart?
>No!
It's from another tutorial.

The upper chart shows actual returns for some stock.

The bottom chart shows returns generated according to NORMINV(RAND(), Mean, SD)

>Yeah, now I remember. The "real" returns have lots of BIG returns, positive and negative.

Them's the tails of the distribution.
We should be able to reproduce that upper chart by using NORMINV(R, Mean, SD) where R has a distribution more like Figure 2C (instead of Figure 2A).

>So, what do you get?
Patience ...

Look at Figure 3. It shows the distribution of about 2000 daily returns for GE stock as well as Normal and Lognormal distributions with the same Mean and Volatility.
Notice that the actual distribution has a taller peak and fatter tails.

>Not much difference between normal and lognormal, eh?
True ... for daily returns. However ...

>Will Figure 2C give that taller peak?
Uh ... not exactly, so we'll consider a somewhat different tack.


Figure 3

for the next Part ...