Rsquared and Best Fit continuation of
Part I

First, we'll recall the magic formulas from Part I:
M = {
N Σxy

Σx
Σy
}
/
{
N Σx^{2}

(
Σx
)^{2}
}
K = {
Σx^{2}
Σy 
Σx
Σxy
}
/
{
N Σx^{2}

(
Σx
)^{2}
}

where we are fitting a straight line, y = Mx + K, to a collection of points
(x_{n}, y_{n}) and
Σx
= x_{1} + x_{2} + ... + x_{N}
and
Σxy =
x_{1}y_{1} + x_{2}y_{2} + ... + x_{N}y_{N}
etc. etc.


Recall that M and K were chosen so as to minimize the Mean Squared Error:
E_{1} = (1/N)Σ{y_{n}  (M x_{n} + K)
}^{2}
Note, too, that if we divide both numerator and denominator by N^{2}, in the expression
for M, we get:
M = {
(1/N)Σxy 
{(1/N)Σx}
{(1/N)Σy}
}/SD^{2}(x)
where SD(x) is the Standard Deviation of the x_{n}.
Further, if the Standard Deviation for the y_{n} is SD(y), then consider
the ratio:
r = M SD(x)/SD(y)

= M SD^{2}(x) /
{SD(x) SD(y)}
 
= {(1/N)Σxy 
{(1/N)Σx}
{(1/N)Σy}
}
/{
SD(x)SD(y)
}
 
= {
Mean(xy)  Mean(x) Mean(y)
}
/{
SD(x)SD(y)
}

>Mamma mia!
Well, maybe it looks better if we write the Average of something with an overbar,
like so:
= (1/N)Σxy
Note that = (1/N)
(
x_{1}y_{1}+x_{2}y_{2}+...+x_{N}y_{N}
)
Forging ahead, we get:
r

= {

}
/{
SD(x)SD(y)
}
 
=
/{
SD(x)SD(y)
}

>Want my opinion? It don't look any better!
Pay attention. Notice that, if the deviations from their means are identical, then the numerator
is just SD^{2} and so is the denominator
so the ratio r = +1.
Also, if the deviations are the negative, one of the other, then ...
>One of the other?
If, when x is higher than its mean, then y is smaller than its mean ...
by the same amount ... in this case the numerator is the negative of the denominator
and the ratio r = 1.
In fact, r is called the
Pearson productmoment correlation coefficient, after
Karl Pearson, the guy who first coined the phrase Standard Deviation.
It measures how correlated the y_{n} are, to the x_{n}.
It varies from 1 to +1 and
 r = +1 means perfect (linear) correlation
 r = 0 means no correlation
 r = 1 means perfect inverse correlation


>So?
So one normally uses the square of r and calls it ...
>Rsquared, right?
Right!
>Is there some simple formula for ...?
Sure, we stare at the expression for r,
fiddle ... then we come up with:
>That's simple?
We might also point out that the numerator is the square of the covariance
of variables x and y.
>Which numerator?
This guy is the Covariance:
If {x} and
{y} are the same set of numbers,
then the covariance is just the Variance of the set. That is, it's the
Standard Deviation, squared.
One last thing.
If the set {x} is the set of
S&P 500 returns and the set {y}
is a set of returns for some stock, we can compare the stock to "the market"
... sorta like R^{2} ... but now it has a special name:
which, as we see, can be written as:
Covariance(x,y) / Variance(x)
or
{SD(y)/SD(x)} * r
Remember what r is?
>zzzZZZ
Well, Rsquared = r^{2}, so we have that:
Beta^{2}/Rsquared
= {SD(y)/SD(x)}^{2}
In any case, because beta involves the ratio of Standard Deviations (or Volatilities),
for a large beta one is tempted to say that the stock has a volatility larger than
the S&P 500 ... but the r correlation may be small.
For example, if this
Pearson correlation coefficient is negative, it means the stock price, though perhaps volatile,
tends to move in a direction opposite to the S&P 500. Interesting, eh? But, the most interesting
point is this:
Beta is the slope of the regression line
>zzzZZZ
Here are some examples where, in each case, Beta is the slope of the "bestfit" regression line
(shown in red):
>Finally! Some pictures!
In the middle chart, the yvalues change at roughly twice the xvalue rate (note that beta is roughly 2)
and, in the rightmost chart ...
>The yvalue changes are the negative of the xvalue changes.
Roughly. Here's another, where the axes are labelled in percentages, so S&P500 returns run from
15% to +11% and General Electric returns run from 13% to +19%:
>And beta is 1.15, right?
Right.
>But the points can be very widely scattered  far from the
regression line  and still have a beta of 1.0, right?
Right. The points may be close to the regression line ... or far. Here, Rsquared plays a role.
And y'all looky here
In this case we're plotting something against itself! Of course, the match is perfect.
>And the Covariance number?
That's now the Variance of the data. Take its square root, about 0.047 (or 4.7%) and you've got
the Standard Deviation (or Volatility) of the data (which could be stock returns).
>Volatility is the same as risk, right?
Wrong! Haven't you learned anything about
risk?


for Part 3
