R-squared and Best Fit   continuation of Part I

First, we'll recall the magic formulas from Part I:

 M = { N Σxy - Σx Σy } / { N Σx2 - ( Σx )2 } K = { Σx2 Σy - Σx Σxy } / { N Σx2 - ( Σx )2 }

where we are fitting a straight line, y = Mx + K, to a collection of points (xn, yn) and
Σx = x1 + x2 + ... + xN
and
Σxy = x1y1 + x2y2 + ... + xNyN
etc. etc.

Recall that M and K were chosen so as to minimize the Mean Squared Error:    E1 = (1/N)Σ{yn - (M xn + K) }2

Note, too, that if we divide both numerator and denominator by N2, in the expression for M, we get:
M = { (1/N)Σxy - {(1/N)Σx} {(1/N)Σy} }/SD2(x)

where SD(x) is the Standard Deviation of the xn.

Further, if the Standard Deviation for the yn is SD(y), then consider the ratio:
 r = M SD(x)/SD(y) = M SD2(x) / {SD(x) SD(y)} = {(1/N)Σxy - {(1/N)Σx} {(1/N)Σy} } /{ SD(x)SD(y) } = { Mean(xy) - Mean(x) Mean(y) } /{ SD(x)SD(y) }

>Mamma mia!
Well, maybe it looks better if we write the Average of something with an overbar, like so:     = (1/N)Σxy
Note that = (1/N) ( x1y1+x2y2+...+xNyN )

 r = { - } /{ SD(x)SD(y) } = /{ SD(x)SD(y) }

>Want my opinion? It don't look any better!
Pay attention. Notice that, if the deviations from their means are identical, then the numerator is just SD2 and so is the denominator
so the ratio r = +1. Also, if the deviations are the negative, one of the other, then ...

>One of the other?
If, when x is higher than its mean, then y is smaller than its mean ... by the same amount ... in this case the numerator is the negative of the denominator
and the ratio r = -1.

In fact, r is called the Pearson product-moment correlation coefficient, after Karl Pearson, the guy who first coined the phrase Standard Deviation.

It measures how correlated the yn are, to the xn. It varies from -1 to +1 and
 r = +1 means perfect (linear) correlation r = 0 means no correlation r = -1 means perfect inverse correlation

>So?
So one normally uses the square of r and calls it ...
>R-squared, right?
Right!
>Is there some simple formula for ...?
Sure, we stare at the expression for r, fiddle ... then we come up with:

>That's simple?

We might also point out that the numerator is the square of the covariance of variables x and y.

>Which numerator?
This guy is the Covariance:

If {x} and {y} are the same set of numbers, then the covariance is just the Variance of the set. That is, it's the Standard Deviation, squared.

which, as we see, can be written as:
Covariance(x,y) / Variance(x)     or     {SD(y)/SD(x)} * r
Remember what r is?
>zzzZZZ
Well, R-squared = r2, so we have that:

Beta2/R-squared = {SD(y)/SD(x)}2

In any case, because beta involves the ratio of Standard Deviations (or Volatilities), for a large beta one is tempted to say that the stock has a volatility larger than the S&P 500 ... but the r correlation may be small. For example, if this Pearson correlation coefficient is negative, it means the stock price, though perhaps volatile, tends to move in a direction opposite to the S&P 500. Interesting, eh? But, the most interesting point is this:

Beta is the slope of the regression line
>zzzZZZ
Here are some examples where, in each case, Beta is the slope of the "best-fit" regression line (shown in red):

>Finally! Some pictures!
In the middle chart, the y-values change at roughly twice the x-value rate (note that beta is roughly 2) and, in the rightmost chart ...
>The y-value changes are the negative of the x-value changes.
Roughly. Here's another, where the axes are labelled in percentages, so S&P500 returns run from -15% to +11%
and General Electric returns run from -13% to +19%:

>And beta is 1.15, right?
Right.

>But the points can be very widely scattered - far from the regression line - and still have a beta of 1.0, right?
Right. The points may be close to the regression line ... or far. Here, R-squared plays a role.