Covariance and Correlation

So there I was, writing up a tutorial on beta, and (while surfing the Net) I ran across something I hadn't realized.

>You're kidding, right? Something you hadn't realized? I can't believe ...
Pay attention.
First, let's talk about Covariance which gives some measure of how two sets of varables are related.
Supose that x1, x2, ... xn and y1, y2, ... yn are the two sets.
Suppose, further, that M[x] = (1/n) (x1+x2+ ... +xn) and M[y] = (1/n) (y1+y2+ ... +yn) are the two Means.
Then:

COVAR[x,y] = (1/n) {(x1-M[x])(y1-M[y]) + (x2-M[x])(y2-M[y]) + ... +(xn-M[x])(yn-M[y]) }
which we'll write (for sanitary reasons) as:
COVAR[x,y] = (1/n) Σ(x-M[x])(y-M[y])
It's the average product of two deviations: (x from its Mean) and (y from its Mean).
If xj and yjare both above or below their Mean, then the product is positive and increases the Covariance.
On the other hand, if yj is above its mean when xj is below (or vice versa), their product is negative and that decreases the Covariance.

We see that a large Covariance means that the x's and y's tend to be on the same side of their respective Means.
A small Covariance ...

>It means that when one is up the other is down, right?
Roughly speaking, yes.
Note, however, that if x and y are measured in metres, then the Covariance is measured in metres2.
If x is measured in feet and y is also measured in feet, then the Covariance is measured in feet2.
If x is measured in miles per hour and y in hours, then the Covariance is measured in miles.
However, if x is measured in ...

My point is that the value of Covariance depends upon who's measuring the variables.
I measure x and y in metres and you measure them in feet and we get two different numbers for the Covariance.

>So?
So, suppose we measure the deviations (xj-M[x]) and (yj-M[y]) in terms of how many Standard Deviations each variable is, from its Mean.

>Huh?
Okay. I say that   x1-M[x]   is 2.5 Standard Deviations and   x2-M[x]   is -1.07 Standard Deviations etc. etc.
In other words, we use NOT the raw variables x and y but these same variables divided by their Standard Deviations SD[x] and SD[y].
These new variables we'll call u and v, so:

u = x / SD[x]     v = y / SD[y]
Now we calculate the Covariance of these new variables, like so:
COVAR[u,v] = (1/n) Σ(u-M[u])(v-M[v])
But Mean[u] = Mean[x] / SD[x] and Mean[v] = Mean[y] / SD[y] so we get:
COVAR[u,v] = (1/n) Σ(x-M[x])(y-M[y]) / SD[x]SD[y] = COVAR[x,y] / SD[x]SD[y]
and that guy is somebuddy we've seen often. He's r, the Pearson Correlation. (See Best Line fit.)

>That's interesting ... to you?
Yes! Don't you see?
The Pearson Correlation r is just the garden variety Covariance
... but where we measure deviations from the Mean in terms of How many Standard Deviations from the Mean?.

>And that's interesting to you?
Don't you see? You can measure in feet and I measure in metres and we'd get the same Covariance if we use those "standardized" variables, where they're divided by their Standard Deviations. Isn't that neat? These standardized variables are dimensionless. They aren't in feet OR metres. They're a ratio of like quantities: feet divided by feet or metres divided by metres ... or whatever.

>And you didn't know that? I thought everybuddy knew that.
Thanks.

>So, do you have any examples of this correlation stuff?
Yes, if we consider annual returns for various assets we get this:

>So they're really Covariances, but with the deviations from the Mean measured in Standard Deviations.
Well, yes, but here we don't have to worry about different units, like feet and metres, because the data sets are "Returns" which don't have dimensions. They depend upon the ratios of like quantities, namely prices.

>So they'd be the same numbers if prices were measured in Yen, right?
Yes.
Here are examples of highly correlated returns and low correlation and negatively correlated:

 High Correlation Low Correlation Negative Correlation