Latin Hypercube and sampling stuff

Here's the problem.
 We have a collection of, say, monthly (or yearly) market returns, dating back a jillion months (or years).
 We compute the Mean and Standard Deviation of this collection.
 We assume that the future evolution of this market (a stock, a mutual fund, whatever)
is such that the future monthly (or yearly) returns will have the same Mean and Standard Deviation.
>You're kidding, right?
Pay attention.
 We assume a Cumulative Probability Distribution with the prescribed Mean and SD. It
might look like Fig. 1
where we note that 0.7 (or 70%) of the returns are less than 20% and 0.1 (or 10%) are less than
16% and 0.5 (or 50%) are less than 10% etc..
 In general, we pick a yvalue between 0 and 1 (like 0.7) and look at the
corresponding xvalue (like 20%) and say:
"a fraction y of returns are less
than x%"
>And this curve  the cumulative distribution  it only depends upon the Mean and SD?
Yes, if we assume a Normal or Lognormal distribution ... they're the most popular choices.
 Fig. 1

Okay, now we describe how to generate a sequence of returns for the future evolution of our
investment.
 Pick a number at random, from 0 to 1. Call it y.
 Determine the corresponding xvalue. That's our return.
 Apply this return to our investment.
 Repeat steps 1, 2 and 3 until we've estimated the value of our portfolio for the next
twenty (or thirty or forty) years.
For example, if we repeat this process twenty times (meaning we pick twenty yvalues
at random, between 0 and 1), we might get points on the cumulative distribution, like Fig. 2
This procedure is what's normally used in Monte Carlo simulations, like the spreadsheet described
here.
One problem with this "simple" random
sampling is that we often miss the outliers which live on the tails of the distribution,
where y is very near 0 or 1, and the corresponding x is very negative
(a market crash) or very large (a roaring bull).
Another technique is to use what's called Latin Hypercube sampling.
 Fig. 2

Here's what we do:
 Divide the interval from 0 to 1 into twenty equal subintervals and pick a yvalue in each subinterval.
 Determine the corresponding xvalue. That's our return.
 Apply this return to our investment.
 Repeat steps 1, 2 and 3 until we've estimated the value of our portfolio for the next
twenty (or thirty or forty) years.
If we repeat this process twenty times (meaning we pick twenty yvalues,
each within the twenty subintervals), we get points on the cumulative distribution,
like Fig. 3
>Just twenty?
Or a thousand. It depends upon how many random numbers you want.
>I like it!
Yes, it covers the whole distribution and ...
>Why "Hypercube"?
Well, we're choosing yvalues in the interval (0,1). If we selected not a single
variable (like y), but many variables (like
y_{1}, y_{2}, y_{3}),
each lying in the interval (0,1), then the point
(y_{1},y_{2}, y_{3}) lies within a cube in
3dimensional space. For umpteen yvalues (instead of just 3), it's called a Hypercube.
>I'm sorry I asked.
 Fig. 3

You haven't asked "Why Latin?".
>I don't want to know.
I think it comes from Latin Squares where there's an array of symbols and each occurs just once ...
>I don't want to know!
Okay. I don't know why Latin, but the technique is also called "Stratified Sampling".
Notice that, even though we subdivide the interval (0,1) into N subintervals of equal length
(if we want to generate N pseudorandom numbers), how do we select a number within each of
these subintervals?
>I'd pick the centre of each.
Normally, in Latin Hypercube sampling for Monte Carlo type simulation, we pick them
randomly. That is, if a subinterval runs from y = a to y = b,
we choose y = a+(ba)*R where R is a random number between 0 and 1.
That puts the yvalue somewhere between a and b. If we do this for our
example of size twenty, we get samples which look something like this:
>So there still is some randomness involved, eh?
Very little, but if the number N is large, the distribution of random numbers generated via this
Latin Hypercube sampling is the same as the simple sampling mechanism described for typical
Monte Carlo simulations ... except fewer samples are required in order that the sample
mimic the Mean and Standard Deviation of the distribution.
>So, are you going to do something with this Latin stuff?
Well, I was thinking of building it into the Monte Carlo spreadsheet mentioned above.
>When?
One of these days ...
In the meantime, consider this:
 We consider a Normal Cumulative distribution with Mean = 0 and Standard Deviation = 1.
 We pick a sample of 100 numbers from this distribution and compute the
Mean and SD of the sample.
 We plot (Mean,SD)
(for the sample) and expect this point to lie close to
(0,1)
 We repeat steps 1, 2 and 3 a few times to see where the points
(Mean,SD) lie.
 We repeat steps 1  4 for both Simple and Latin Hypercube sampling.
We get something like this


>What about a Lognormal distribution and Monte Carlo simulation and ...
Yeah, well, here are some early results (while we work on the spreadsheet). We do this:
 We use a Lognormal distribution with Mean of 10% and Standard Deviation of 20%.
 We do a 40year Monte Carlo simulation and determine the Mean and Standard Deviation
of the 40 annual returns.
 We repeat step 2 a hundred times and plot
(Mean,SD)
to compare with
(10%,20%).
Here are some samples:
... and another typical distribution picture:
>So, I guess one needn't do a thousand Monte Carlo simulations and ...
Aah, but there's a dark side to Latin Hypercube sampling:
Suppose we invest $1 in a stock which has a historical Mean of 10% and Standard Deviation
of 20% and we do a 30year Monte Carlo simulation using Simple Sampling.
We'd get
portfolio growth which would look like one of these five typical portfolios:
>The typical portfolios are the thin lines, but the thick red line is ... what?
That's what we'd get if we assumed a constant annual return, without variance. It gives a sort of
ball park estimate, without variation. It lies somewhere among the various possible portfolio
scenarios. It uses a fixed annualized return given by:
(1+Annualized)^{2} = (1+Mean)^{2}  SD^{2}
(see the
magic formula)
which, for Mean = 10% and SD = 20% gives Annualized = 8.2% and a final portfolio
of $10.54 so that ...


>Yeah, yeah, but who uses that ... these days? I mean, a fixed return and ...
Hold on! Now we do the same, but with Latin Hypercube sampling where ...
>I can hardly wait!
Pay attention. With Latin Hypercube sampling, our 30 annual returns hardly change from one
possible 30year Monte Carlo future to the next. It's just the ordering of these
returns which changes ... and that would certainly affect the result if there were annual
deposits or withdrawals. However, with just our $1 investment, the result of applying 30
annual returns, in random order, yields the same final portfolio.
>But the Latin stuff does have some randomness, right?
Very little, so we'd expect to get wildly different portfolios during the 30 years, but only
a slight variation in the final portfolio ... like so



>Is that good ... or bad?
I don't like it. In fact, one of my complaints about the usual Monte Carlo is that it assumes
a fixed Mean and Standard Deviation for the next umpteen years, whereas the
Mean and Standard Deviation
have been quite different, historically, from one decade to the other.
>But "simple sampling" Monte Carlo gives a variety of Means and ...
Yes. A distribution of Means and SDs, eh? Makes you think, eh?
>So, are you gonna stick "Latin" in the spreadsheet?
Sure. The user can decide what s/he wants to use.

>How do I get the spreadhseet?
Click here do not pass go do not collect $200
... but beware. It can give very optimistic scenarios, like so:
>I like optimistic!
