the "Best Line" Fit to Data and the Error and Beta a continuation of
Part II

We'll summarize what we found in Parts I and II:
We're fitting a straight line, y = Mx + K, to a collection of points
(x_{n}, y_{n}) ... called the Regression Line.
We're using the notation:
Σx
= x_{1} + x_{2} + ... + x_{N}
and
Σxy =
x_{1}y_{1} + x_{2}y_{2} + ... + x_{N}y_{N}
etc. etc.
M and K are chosen so as to minimize the Mean Squared Error:
Error^{2} = (1/N)Σ{y_{n}  (M x_{n} + K)
}^{2}

That requirement gives:
M = {
N Σxy

Σx
Σy
}
/
{
N Σx^{2}

(
Σx
)^{2}
}
K = {
Σx^{2}
Σy 
Σx
Σxy
}
/
{
N Σx^{2}

(
Σx
)^{2}
}

 Figure 1

We saw that the slope of the "best fit" line can be written:
[1]
M = COVAR[x,y] / SD^{2}[x]
where COVAR[x,y] = (1/N)Σxy  {(1/N)Σx} { (1/N)Σy} = Mean[xy]  Mean[x]Mean[y] is the Covariance of x and y
and SD^{2}[x] = (1/N)Σx^{2}  {(1/N)Σx}^{2} =
Mean[x^{2}]  (Mean[x])^{2} is the Variance or (Standard Deviation)^{2} of the set of x's
(See statstuff.htm#3)
K  = {Mean[x^{2}]Mean[y]  Mean[x]Mean[xy]} / SD^{2}[x]
  = {(SD^{2}[x] +(Mean[x])^{2})Mean[y]  Mean[x](COVAR[x,y] +Mean[x]Mean[y])} / SD^{2}[x]


>Why are you doing this again?
Although we've identified the "best line" fit to the data, we failed to determine the minimum error.
>The minimum error?
Yes, the minimum of Error ... remember? The slope and intercept of the "best line", that's M and K, was chosen to minimize Error.
So we write:
Error^{2}  = (1/N)Σ{y  (M x+ K)}^{2} where we're dropping the subscripts for sanitary reasons
  = (1/N)Σ{y^{2}  2y(M x+ K) + (M x+ K)^{2}}
  = (1/N)Σy^{2}  2(M/N)Σxy  2(K/N)Σy +
(M^{2}/N)Σx^{2} + (2MK/N)Σx + (K^{2}/N)Σ(1)

>Ugh.
Do you see all those Means?
>Ugh!
The Error can be expressed in terms of five Means.
In fact, Error can be expressed in terms of the statistical parameters of the x and ysets and their Covariance ... like so:
To simplify we'll let:
Mean[x] =X, SD[x] = A
Mean[y] =Y, SD[y] = B
COVAR[x,y] = C
then, using [1]:
Mean[xy] = COVAR[x,y] + Mean[x]Mean[y] = C + XY
Mean[x^{2}] = SD^{2}[x] + (Mean[x])^{2} = A^{2} + X^{2}
Mean[y^{2}] = SD^{2}[y] + (Mean[y])^{2} = B^{2} + Y^{2}
so we can write
M = C / A^{2}
K = ((A^{2}+X^{2})Y  X(C+XY) ) / A^{2} =
(A^{2}Y  CX) / A^{2}

so
Error^{2}  = (1/N)Σy^{2}  2(M/N)Σxy  2(K/N)Σy +
(M^{2}/N)Σx^{2} + (2MK/N)Σx + (K^{2}/N)Σ(1)
  = (B^{2}+Y^{2})
2(C/A^{2})(C+XY)
2(A^{2}Y  CX)Y/A^{2}
+(C/A^{2})^{2}(A^{2}+X^{2})
+2(C/A^{2})(A^{2}YCX)X/A^{2}+(A^{2}YCX)^{2}/A^{4}
where Σ(1) = 1+1+1+...+1 = N

>zzzZZZ
Patience! We just simplify. Lots of stuff cancels out and we get (finally!) Error^{2} = B^{2}  C^{2} / A^{2}:
We place this in a position of eminence:
[2]
Error^{2}  = SD^{2}[y]  COVAR^{2}[x,y] / SD^{2}[x]
  = SD^{2}[y] (1  COVAR^{2}[x,y] / SD^{2}[x]SD^{2}[y])
  = SD^{2}[y] (1  r^{2})


>Huh?
The square of r is called ...
>Rsquared?
Yes. If the correlation r = 1 or 1, then the Error is zero.
The points (x_{1}, y_{1}), (x_{2}, y_{2}) etc. lie right on that "best line".
 r = +1 means perfect (linear) correlation
 r = 0 means no correlation
 r = 1 means perfect inverse correlation


>And for zero correlation then ... uh ...
Then the Error is just the Standard Deviation of the set of y's.
For example, stare at the charts here
The values of x_{1}, x_{2}, etc. are the same for both charts.
In fact, the Pearson correlation is also the same for both charts (namely r = 0.99).
The difference is in the volatility of the set of y's:
SD[y] is larger for the lower chart ... hence the Error is larger.
In fact, it's larger in proportion to the Standard Deviation.
(But that's just because r happens to be the same for both charts.)
In general, changing the Standard Deviation of the y's will change r as well so we can't conclude that
the Error is smaller just because r is smaller.
>But it helps.
Yes. It helps.
 Figure 2

>So that Error, SD^{2}[y] (1  r^{2})
... does it have a name?
Uh ... not that I know of. How about calling it Error?
>Very funny.
Besides, we're calling SD^{2}[y] (1  r^{2}) the Error^{2}
because it's the Mean Squared Error.
One thing that's a little bothersome is that the Error isn't symmetrical in x and y.
>Huh?
Although the correlation r is unchanged if the x's and y's are switched, the
Error does change. That seems strange, doesn't it? I mean, if you want to know the error in fitting a straight
line to (x,y) data, why should the resultant error depend upon which variable you choose as x and which as y? Taking the vertical distance to that "best line" gives
to the y's a special role.
We could introduce symmetry by calculating the average of the two Errors, when the x's and y's are switched:
symmetrical Error^{2} = (1/2)(SD^{2}[x] + SD^{2}[y] )(1  r^{2})
Or (and this one I like better), we could take as Error^{2} the Mean Squared perpendicular distance of the
points to that "best line". That'd give another symmetrical Error:
[4]
Error^{2} = (1  r^{2})SD^{2}[x]SD^{2}[y]/(SD^{2}[x] + SD^{2}[y] )

 Figure 3

>So does that Error have a name?
Uh ... not that I know of. How about calling it another ...?
>Forget it. So ... how about the slope and intercept?
You mean M and K? So, what about them?
>Do they change when you interchange x and y?
Uh ... good point. They do change.
Since the covariance, C, doesn't change, M is either C/A^{2} = COVAR[x,y]/SD^{2}[x] or C/B^{2} = COVAR[x,y]/SD^{2}[y].
So the moral of this story is just this:
When doing "best line" fit to (x,y) data, be wise in choosing which variable is x and which is y.

There's this other thing called Beta, namely:
[5]
Beta[x,y] = COVAR[x,y] / SD^{2}[x]

It's used to determine whether two time series (say the monthly S&P 500 returns and the returns for Microsoft) tend to move up or down together and ...
>Hey! That Beta is just the slope of that "best line" fit ... isn't it?
Yes. Beta[x,y] = M.
If the two sets, x and y, are daily (weekly? monthly?) Returns*, then a Beta[x,y] of 1.5 means that the increase in the yReturns tend to be 1.5 times the
increase in the xReturns. That means that the yReturns tend to change more than the xReturns. That means ...
>But that Beta depends upon which set of returns you choose for x and y, right?
Yes. It could be COVAR[x,y] / SD^{2}[x] or COVAR[x,y] / SD^{2}[y] so one normally uses Beta to determine the relationship
of stock returns to the Market ... in which case the xReturns are, say, the TSE300 or the S&P500 or some other "benchmark" set of returns.
>So Beta = 1.5 means your stock is 1.5 times more volatile than the market, eh?
Uh ... not exactly. It's SD that measures volatility, not Beta.
>But I've read that "Beta measures the volatility of a stock compared to the volatility of the market".
Yeah, I've read that too. (Check google.)
In fact, Beta measures how changes in returns are related ... not the returns themselves.
Remember, the slope is (change in y) / (change in x).
For example, if Market returns increase from 8% to 10% (that's a 2% increase) then you might expect your stock returns to increase by 1.5(2) = 3%.
>Assuming Beta = 1.5, right?
Right. We could say that the stock will participate in Market moves, but more or less, depending upon Beta.
 Figure 4

In fact, we have:
[6]
Beta[x,y] = r SD[y] / SD[x]

since COVAR[x,y] = r SD[x] SD[y].
So Beta is the ratio of volatilities multiplied by the Pearson correlation.
Here are some examples:
However, if the "best fit" line happens to pass through the origin, then the slope is y / x.
>Pass through the origin? That means the Intercept = 0, right?
Yes, and the intercept also has a name. It's called ...
>It's called K, right?
Well ... uh, investment gurus call it Alpha.
Notice an interesting thing: if we're measuring the Beta of a set of returns with itself (so y = x)
then Beta[x,x] = COVAR[x,x] / SD^{2}[x] = 1.
(That just says that the "best fit" line, namely y = x, has slope = 1.)
That means that Beta of the Market is 1 ... since we'd be comparing the Market with itself!
Bloomberg (and
others), define
Beta as the slope of the "best line" fit when you plot excess returns: the stock against the Market
>Excess?
Yes, the actual return less some riskfree return such as Money Market or maybe Tbills ... but we'll stick with the actual returns and forget using the excess.
>Does it make a difference?
Not really. Subtracting a constant riskfree rate, C, from returns will give the same value for
Beta = COVAR[x,y] / SD^{2}[x] ... since neither COVAR[x,y] nor SD[x] will change when x and y are replaced by xC and yC.
>How about an example?
Consider Microsoft vs the S&P 500 (which is our Market).
We look at the daily returns over the last few weeks and get a "best fit" line like Figure 5.
>So Beta = 1.097 ... so MSFT is 1.097 times more volatile, right?
We'll check. The parameters turn out to be:
 Correlation r = 0.4818
 SD[MSFT] = 1.520% SD[S&P] = 0.668%
 Ratio of Volatilities = SD[MSFT] / SD[S&P] = 1.520 / 0.668 = 2.275
... so MSFT was 2.275 times more volatile, over the past few weeks
 Beta = 1.097 (the slope of the "best fit" line)
Note that Beta = r SD[MSFT] / SD[S&P] = (0.4818)(2.275) = 1.097
>So why do they say that Beta is a comparison of volatilities?
Probably because it provides a simple explanation ... even though it ain't true
 Figure 5

>But if Beta is less than 1, then surely the stock is less volatile than the market, eh?
Think so?
Figure 6 shows the daily returns over the past few weeks for Eastman Kodak versus the S&P.
Beta is less than 1 (it's 0.81) so one might conclude that EK was less volatile than the S&P.
In fact, EK was twice as volatile! Look at that upper chart.
>The ratio of volatilities was 2?
Pretty close. SD[EK] = 1.377% and SD[S&P] = 0.668% and 1.377 / 0.668 = 2.06
In fact, the Correlation between EK and S&P is r = 0.393
which is pretty small (hence decreases the value of Beta). That's why Beta = (0.393) * (Ratio of Volatilites) is smaller than the ratio of volatilities.
>Beta is small ... even tho' the ratio of volatilies is large, eh?
Exactly.
>So why do they say that Beta is a comparison of volatilities?
I give up. Why?
 Figure 6

* Beta is usually calculated using Monthly returns ... I think!
See also Capital Asset Pricing Model
and CAPM & Sharpe
and CAPM spreadsheet
and Beta
