Multiple Linear Regression

We assume that some set of variables, y1, y2, ... yK, is dependent upon variables xk1, xk2, ... xkn (for k = 1 to K).
We assume the relationship betwen the ys and xs is "almost" linear, like so:

[1]       y1 = β0 + β1x11 + β2x12 + ... +βnx1n + ε1
y2 = β0 + β1x21 + β2x22 + ... +βnx2n + ε2
....
yK = β0 + β1xK1 + β2xK2 + ... +βnxKn + εK

>Why so many variables?
Well, suppose we note that, when the xs have the values x11, x12, ... x1n, the y-value is y1.
We suspect an almost linear relationship, so we try again, noting that x-values x21, x22, ... x2n result in a y-value of y2.
We continue, for K observations, where, for x-values xK1, xK2, ... xKn the result is a y-value of yK.
Then in an attempt to identify the "almost" linear relationship, we assume the relationship [1].

We can write this in matrix format, like so:
[2]
or, more elegantly:
[3]       y = Xβ + ε       where y, β and ε are column vectors and X is a K x (n+1) matrix.

We attempt to minimize the sum of the squares of the errors (also called "residuals") by clever choice of the parameters β0, β1, ... βn.
E = ε12 + ε22 + ...+εK2 = εTε     where the row vector εT denotes the transpose of ε.

(Note that this sum of squares is just the square of the magnitude of the vector ε ... so we're making the vector as small as possible.)

We set all the derivatives to zero, to locate the minimum.
For each j = 0 to n we have:
∂E / ∂βj = 2 ε1ε1/∂βj + 2 ε2ε2/∂βj + ... + 2 εKεK/∂βJ = 2 Σ εkεk/∂βj = 0     the summation being from k = 1 to K.

Since εk = yk - β0 - β1xk1 - β2xk2 - ... - βnxkn   (from [1]) then:
 εk ∂εk/∂βj = [yk - β0 - β1xk1 - β2xk2 - ... - βnxkn] (-xkj) = -(xkj)* [yk - β1xk1 - β2xk2 - ... - βnxkn] = -(the kjth component of X)*[the kth component of y - Xβ] = -(the jkth component of XT)*[the kth component of y - Xβ]

then we have n+1 equations (for j = 0 to n) like:
[4]       Σεkεk/∂βj = - Σ(the jkth component of XT)*[the kth component of y - Xβ]   ... summed over k.

But [4] defines the jth component of the n-component column vector: XT [ y - Xβ ] .

Setting them to zero gives us n+1 such linear equations to solve for the n+1 parameters β0, β1, β2 ... βn, namely:
[5]       XT [ y - Xβ ] = 0.

>What about the xs and ys?
We know them. They're our observations and our goal is to determine the "almost" linear relationship between them.
That means finding the (n+1) β-values which we do by solving [5] for:
[6]       β = (XTX)-1XTy   where (XTX)-1 denotes the inverse of XTX.

>Don't you find that ... uh, a little confusing?
It'll look better if we elevate it's position like so:
 If we wish to find the "best" linear relationship between the values of y1, y2, ... yK and xk1, xk2, ... xkn (for k = 1 to K) according to:       y1 = β0 + β1x11 + β2x12 + ... +βnx1n + ε1       y2 = β0 + β1x21 + β2x22 + ... +βnx2n + ε2       ....       yK = β0 + β1xK1 + β2xK2 + ... +βnxKn + εK or       or       y = Xβ + ε where the K-vector ε denotes the errors (or residuals) in the linear approximation, y is a K-vector, β an (n+1)-vector and X a K x (n+1) matrix, then we can minimize the size of the residuals by selecting the β-values according to:       β = (XTX)-1XTy
>Well it doesn't look better to me!
Here's an example where K = n = 3 and β0 = 0 (so we're looking for β1, β2 and β3 and we ignore that first column of 1s in X):

Note the assumed values for the X matrix and the column vector y ... coloured
We run thru' the ritual, calculating XT and XTX etc. etc. ... and finally the β parameters ... coloured
The resultant (almost linear) relationship is inside the blue box.
There are (as expected!), errors denoted by e1, e2 and e3 ... but they're pretty small. They're coloured
If, instead of those "best" choices for the parameters, we had chosen a different set, say β'   coloured   the errors are significantly greater.

>Is there a spreadsheet, so I can play?