Latin Hypercube     and sampling stuff

Here's the problem.

  • We have a collection of, say, monthly (or yearly) market returns, dating back a jillion months (or years).
  • We compute the Mean and Standard Deviation of this collection.
  • We assume that the future evolution of this market (a stock, a mutual fund, whatever) is such that the future monthly (or yearly) returns will have the same Mean and Standard Deviation.
>You're kidding, right?
Pay attention.
  • We assume a Cumulative Probability Distribution with the prescribed Mean and SD. It might look like Fig. 1
    where we note that 0.7 (or 70%) of the returns are less than 20% and 0.1 (or 10%) are less than -16% and 0.5 (or 50%) are less than 10% etc..
  • In general, we pick a y-value between 0 and 1 (like 0.7) and look at the corresponding x-value (like 20%) and say:
    "a fraction y of returns are less than x%"
>And this curve - the cumulative distribution - it only depends upon the Mean and SD?
Yes, if we assume a Normal or Log-normal distribution ... they're the most popular choices.

Fig. 1
Okay, now we describe how to generate a sequence of returns for the future evolution of our investment.

  1. Pick a number at random, from 0 to 1. Call it y.
  2. Determine the corresponding x-value. That's our return.
  3. Apply this return to our investment.
  4. Repeat steps 1, 2 and 3 until we've estimated the value of our portfolio for the next twenty (or thirty or forty) years.
For example, if we repeat this process twenty times (meaning we pick twenty
y-values at random, between 0 and 1), we might get points on the cumulative distribution, like Fig. 2

This procedure is what's normally used in Monte Carlo simulations, like the spreadsheet described here.

One problem with this "simple" random sampling is that we often miss the outliers which live on the tails of the distribution, where y is very near 0 or 1, and the corresponding x is very negative (a market crash) or very large (a roaring bull).

Another technique is to use what's called Latin Hypercube sampling.


Fig. 2
Here's what we do:
  1. Divide the interval from 0 to 1 into twenty equal subintervals and pick a y-value in each subinterval.
  2. Determine the corresponding x-value. That's our return.
  3. Apply this return to our investment.
  4. Repeat steps 1, 2 and 3 until we've estimated the value of our portfolio for the next twenty (or thirty or forty) years.
If we repeat this process twenty times (meaning we pick twenty y-values, each within the twenty subintervals), we get points on the cumulative distribution, like Fig. 3
>Just twenty?
Or a thousand. It depends upon how many random numbers you want.
>I like it!
Yes, it covers the whole distribution and ...
>Why "Hypercube"?
Well, we're choosing y-values in the interval (0,1). If we selected not a single variable (like y), but many variables (like y1, y2, y3), each lying in the interval (0,1), then the point (y1,y2, y3) lies within a cube in 3-dimensional space. For umpteen y-values (instead of just 3), it's called a Hypercube.
>I'm sorry I asked.

Fig. 3
You haven't asked "Why Latin?".
>I don't want to know.
I think it comes from Latin Squares where there's an array of symbols and each occurs just once ...
>I don't want to know!

Okay. I don't know why Latin, but the technique is also called "Stratified Sampling".
Notice that, even though we subdivide the interval (0,1) into N subintervals of equal length (if we want to generate N pseudo-random numbers), how do we select a number within each of these subintervals?
>I'd pick the centre of each.
Normally, in Latin Hypercube sampling for Monte Carlo type simulation, we pick them randomly. That is, if a subinterval runs from y = a to y = b, we choose y = a+(b-a)*R where R is a random number between 0 and 1. That puts the y-value somewhere between a and b. If we do this for our example of size twenty, we get samples which look something like this:

>So there still is some randomness involved, eh?
Very little, but if the number N is large, the distribution of random numbers generated via this Latin Hypercube sampling is the same as the simple sampling mechanism described for typical Monte Carlo simulations ... except fewer samples are required in order that the sample mimic the Mean and Standard Deviation of the distribution.

>So, are you going to do something with this Latin stuff?
Well, I was thinking of building it into the Monte Carlo spreadsheet mentioned above.
>When?
One of these days ...

In the meantime, consider this:

  1. We consider a Normal Cumulative distribution with Mean = 0 and Standard Deviation = 1.
  2. We pick a sample of 100 numbers from this distribution and compute the Mean and SD of the sample.
  3. We plot (Mean,SD) (for the sample) and expect this point to lie close to (0,1)
  4. We repeat steps 1, 2 and 3 a few times to see where the points (Mean,SD) lie.
  5. We repeat steps 1 - 4 for both Simple and Latin Hypercube sampling.
We get something like this

>What about a Log-normal distribution and Monte Carlo simulation and ...
Yeah, well, here are some early results (while we work on the spreadsheet). We do this:

  1. We use a Log-normal distribution with Mean of 10% and Standard Deviation of 20%.
  2. We do a 40-year Monte Carlo simulation and determine the Mean and Standard Deviation of the 40 annual returns.
  3. We repeat step 2 a hundred times and plot (Mean,SD) to compare with (10%,20%).
Here are some samples:

... and another typical distribution picture:

>So, I guess one needn't do a thousand Monte Carlo simulations and ...
Aah, but there's a dark side to Latin Hypercube sampling:

Suppose we invest $1 in a stock which has a historical Mean of 10% and Standard Deviation of 20% and we do a 30-year Monte Carlo simulation using Simple Sampling.
We'd get portfolio growth which would look like one of these five typical portfolios:
>The typical portfolios are the thin lines, but the thick red line is ... what?
That's what we'd get if we assumed a constant annual return, without variance. It gives a sort of ball park estimate, without variation. It lies somewhere among the various possible portfolio scenarios. It uses a fixed annualized return given by:
(1+Annualized)2 = (1+Mean)2 - SD2

(see the magic formula) which, for Mean = 10% and SD = 20% gives Annualized = 8.2% and a final portfolio of $10.54 so that ...


>Yeah, yeah, but who uses that ... these days? I mean, a fixed return and ...
Hold on! Now we do the same, but with Latin Hypercube sampling where ...
>I can hardly wait!
Pay attention. With Latin Hypercube sampling, our 30 annual returns hardly change from one possible 30-year Monte Carlo future to the next. It's just the ordering of these returns which changes ... and that would certainly affect the result if there were annual deposits or withdrawals. However, with just our $1 investment, the result of applying 30 annual returns, in random order, yields the same final portfolio.
>But the Latin stuff does have some randomness, right?
Very little, so we'd expect to get wildly different portfolios during the 30 years, but only a slight variation in the final portfolio ... like so


>Is that good ... or bad?
I don't like it. In fact, one of my complaints about the usual Monte Carlo is that it assumes a fixed Mean and Standard Deviation for the next umpteen years, whereas the Mean and Standard Deviation have been quite different, historically, from one decade to the other.
>But "simple sampling" Monte Carlo gives a variety of Means and ...
Yes. A distribution of Means and SDs, eh?
Makes you think, eh?
>So, are you gonna stick "Latin" in the spreadsheet?
Sure. The user can decide what s/he wants to use.

>How do I get the spreadhseet?

Click here
do not pass go
do not collect $200

... but beware. It can give very optimistic scenarios, like so:

>I like optimistic!