the "Best Line" Fit to Data and the Error and Beta   a continuation of Part II

We'll summarize what we found in Parts I and II:

We're fitting a straight line, y = Mx + K, to a collection of points (xn, yn) ... called the Regression Line.
We're using the notation:
Σx = x1 + x2 + ... + xN
and
Σxy = x1y1 + x2y2 + ... + xNyN
etc. etc.

M and K are chosen so as to minimize the Mean Squared Error:
 Error2 = (1/N)Σ{yn - (M xn + K) }2

That requirement gives:
 M = { N Σxy - Σx Σy } / { N Σx2 - ( Σx )2 } K = { Σx2 Σy - Σx Σxy } / { N Σx2 - ( Σx )2 }

Figure 1
We saw that the slope of the "best fit" line can be written:
[1]

M = COVAR[x,y] / SD2[x]
where COVAR[x,y] = (1/N)Σxy - {(1/N)Σx} { (1/N)Σy} = Mean[xy] - Mean[x]Mean[y] is the Covariance of x and y
and SD2[x] = (1/N)Σx2 - {(1/N)Σx}2 = Mean[x2] - (Mean[x])2 is the Variance or (Standard Deviation)2 of the set of x's

(See stat-stuff.htm#3)

 K = {Mean[x2]Mean[y] - Mean[x]Mean[xy]} / SD2[x] = {(SD2[x] +(Mean[x])2)Mean[y] - Mean[x](COVAR[x,y] +Mean[x]Mean[y])} / SD2[x]

>Why are you doing this again?
Although we've identified the "best line" fit to the data, we failed to determine the minimum error.

>The minimum error?
Yes, the minimum of Error ... remember? The slope and intercept of the "best line", that's M and K, was chosen to minimize Error.

So we write:
 Error2 = (1/N)Σ{y - (M x+ K)}2     where we're dropping the subscripts for sanitary reasons = (1/N)Σ{y2 - 2y(M x+ K) + (M x+ K)2} = (1/N)Σy2 - 2(M/N)Σxy - 2(K/N)Σy + (M2/N)Σx2 + (2MK/N)Σx + (K2/N)Σ(1)

>Ugh.
Do you see all those Means?
>Ugh!
The Error can be expressed in terms of five Means.
In fact, Error can be expressed in terms of the statistical parameters of the x- and y-sets and their Covariance ... like so:
To simplify we'll let:
 Mean[x] =X,   SD[x] = A     Mean[y] =Y,   SD[y] = B     COVAR[x,y] = C then, using [1]:     Mean[xy] = COVAR[x,y] + Mean[x]Mean[y] = C + XY     Mean[x2] = SD2[x] + (Mean[x])2 = A2 + X2     Mean[y2] = SD2[y] + (Mean[y])2 = B2 + Y2 so we can write     M = C / A2     K = ((A2+X2)Y - X(C+XY) ) / A2 = (A2Y - CX) / A2

so
 Error2 = (1/N)Σy2 - 2(M/N)Σxy - 2(K/N)Σy + (M2/N)Σx2 + (2MK/N)Σx + (K2/N)Σ(1) = (B2+Y2) -2(C/A2)(C+XY) -2(A2Y - CX)Y/A2 +(C/A2)2(A2+X2) +2(C/A2)(A2Y-CX)X/A2+(A2Y-CX)2/A4     where Σ(1) = 1+1+1+...+1 = N

>R-squared?
Yes. If the correlation r = 1 or -1, then the Error is zero. The points (x1, y1), (x2, y2) etc. lie right on that "best line".
 r = +1 means perfect (linear) correlation r = 0 means no correlation r = -1 means perfect inverse correlation

>And for zero correlation then ... uh ...
Then the Error is just the Standard Deviation of the set of y's.

For example, stare at the charts here
The values of x1, x2, etc. are the same for both charts.
In fact, the Pearson correlation is also the same for both charts (namely r = 0.99).
The difference is in the volatility of the set of y's:
SD[y] is larger for the lower chart ... hence the Error is larger.

In fact, it's larger in proportion to the Standard Deviation.
(But that's just because r happens to be the same for both charts.)

In general, changing the Standard Deviation of the y's will change r as well so we can't conclude that the Error is smaller just because r is smaller.

>But it helps.
Yes. It helps.

Figure 2
>So that Error, SD2[y] (1 - r2) ... does it have a name?
Uh ... not that I know of. How about calling it Error?
>Very funny.
Besides, we're calling SD2[y] (1 - r2) the Error2   because it's the Mean Squared Error.

One thing that's a little bothersome is that the Error isn't symmetrical in x and y.
>Huh?
Although the correlation r is unchanged if the x's and y's are switched, the Error does change. That seems strange, doesn't it? I mean, if you want to know the error in fitting a straight line to (x,y) data, why should the resultant error depend upon which variable you choose as x and which as y? Taking the vertical distance to that "best line" gives to the y's a special role.
We could introduce symmetry by calculating the average of the two Errors, when the x's and y's are switched:

symmetrical Error2 = (1/2)(SD2[x] + SD2[y] )(1 - r2)
Or (and this one I like better), we could take as Error2 the Mean Squared perpendicular distance of the points to that "best line". That'd give another symmetrical Error:
 [4] Error2 = (1 - r2)SD2[x]SD2[y]/(SD2[x] + SD2[y] )

Figure 3

>So does that Error have a name?
Uh ... not that I know of. How about calling it another ...?
>Forget it. So ... how about the slope and intercept?
You mean M and K? So, what about them?
>Do they change when you interchange x and y?
Uh ... good point. They do change.

Since the covariance, C, doesn't change, M is either C/A2 = COVAR[x,y]/SD2[x] or C/B2 = COVAR[x,y]/SD2[y].

There's this other thing called Beta, namely:
 [5] Beta[x,y] = COVAR[x,y] / SD2[x]

It's used to determine whether two time series (say the monthly S&P 500 returns and the returns for Microsoft) tend to move up or down together and ...

>Hey! That Beta is just the slope of that "best line" fit ... isn't it?
Yes. Beta[x,y] = M.
If the two sets, x and y, are daily (weekly? monthly?) Returns*, then a Beta[x,y] of 1.5 means that the increase in the y-Returns tend to be 1.5 times the increase in the x-Returns. That means that the y-Returns tend to change more than the x-Returns. That means ...

>But that Beta depends upon which set of returns you choose for x and y, right?
Yes. It could be COVAR[x,y] / SD2[x] or COVAR[x,y] / SD2[y] so one normally uses Beta to determine the relationship of stock returns to the Market ... in which case the x-Returns are, say, the TSE300 or the S&P500 or some other "benchmark" set of returns.

Here are some examples:

However, if the "best fit" line happens to pass through the origin, then the slope is y / x.

>Pass through the origin? That means the Intercept = 0, right?
Yes, and the intercept also has a name. It's called ...
>It's called K, right?
Well ... uh, investment gurus call it Alpha.

Notice an interesting thing: if we're measuring the Beta of a set of returns with itself (so y = x) then Beta[x,x] = COVAR[x,x] / SD2[x] = 1.
(That just says that the "best fit" line, namely y = x, has slope = 1.)
That means that Beta of the Market is 1 ... since we'd be comparing the Market with itself!

Bloomberg (and others), define Beta as the slope of the "best line" fit when you plot excess returns: the stock against the Market
>Excess?
Yes, the actual return less some risk-free return such as Money Market or maybe T-bills ... but we'll stick with the actual returns and forget using the excess.
>Does it make a difference?
Not really. Subtracting a constant risk-free rate, C, from returns will give the same value for
Beta = COVAR[x,y] / SD2[x] ... since neither COVAR[x,y] nor SD[x] will change when x and y are replaced by x-C and y-C.