Multiple Linear Regression

We assume that some set of variables, y_{1}, y_{2}, ... y_{K}, is dependent upon variables
x_{k1}, x_{k2}, ... x_{kn} (for k = 1 to K).
We assume the relationship betwen the ys and xs is "almost" linear, like so:
[1] y_{1} = β_{0} + β_{1}x_{11} + β_{2}x_{12} + ... +β_{n}x_{1n} + ε_{1}
y_{2} = β_{0} + β_{1}x_{21} + β_{2}x_{22} + ... +β_{n}x_{2n} + ε_{2}
....
y_{K} = β_{0} + β_{1}x_{K1} + β_{2}x_{K2} + ... +β_{n}x_{Kn} + ε_{K}
>Why so many variables?
Well, suppose we note that, when the xs have the values x_{11}, x_{12}, ... x_{1n},
the yvalue is y_{1}.
We suspect an almost linear relationship, so we try again, noting that xvalues
x_{21}, x_{22}, ... x_{2n} result in a yvalue of y_{2}.
We continue, for K observations, where, for xvalues
x_{K1}, x_{K2}, ... x_{Kn} the result is a yvalue of y_{K}.
Then in an attempt to identify the "almost" linear relationship, we assume the relationship [1].
We can write this in matrix format, like so:
[2]
or, more elegantly:
[3] y = Xβ + ε
where y, β and ε are column vectors and X is a K x (n+1) matrix.
We attempt to minimize the sum of the squares of the errors (also called "residuals")
by clever choice of the parameters β_{0}, β_{1}, ... β_{n}.
E = ε_{1}^{2} + ε_{2}^{2} + ...+ε_{K}^{2}
= ε^{T}ε where the row vector ε^{T} denotes the transpose of ε.
(Note that this sum of squares is just the square of the magnitude of the vector ε ... so we're making the
vector as small as possible.)
We set all the derivatives to zero, to locate the minimum.
For each j = 0 to n we have:
∂E / ∂β_{j}
= 2 ε_{1} ∂ε_{1}/∂β_{j}
+ 2 ε_{2} ∂ε_{2}/∂β_{j} + ...
+ 2 ε_{K} ∂ε_{K}/∂β_{J}
= 2 Σ ε_{k} ∂ε_{k}/∂β_{j} = 0
the summation being from k = 1 to K.
Since ε_{k} = y_{k}  β_{0}  β_{1}x_{k1}  β_{2}x_{k2}  ...  β_{n}x_{kn}
(from [1]) then:
ε_{k} ∂ε_{k}/∂β_{j}
 = [y_{k}  β_{0}  β_{1}x_{k1}  β_{2}x_{k2}  ...  β_{n}x_{kn}] (x_{kj})
= (x_{kj})* [y_{k}  β_{1}x_{k1}  β_{2}x_{k2}  ...  β_{n}x_{kn}]
  = (the kj^{th} component of X)*[the kth component of y  Xβ]
  = (the jk^{th} component of X^{T})*[the kth component of y  Xβ]

then we have n+1 equations (for j = 0 to n) like:
[4] Σε_{k} ∂ε_{k}/∂β_{j}
=  Σ(the jk^{th} component of X^{T})*[the kth component of y  Xβ]
... summed over k.
But [4] defines the j^{th} component of the ncomponent column vector: X^{T} [ y  Xβ ] .
Setting them to zero gives us n+1 such linear equations to solve for the n+1 parameters β_{0}, β_{1}, β_{2} ... β_{n}, namely:
[5] X^{T} [ y  Xβ ] = 0.
>What about the xs and ys?
We know them. They're our observations and our goal is to determine the "almost" linear relationship between them.
That means finding the (n+1) βvalues which we do by solving [5] for:
[6] β = (X^{T}X)^{1}X^{T}y
where (X^{T}X)^{1} denotes the inverse of X^{T}X.
>Don't you find that ... uh, a little confusing?
It'll look better if we elevate it's position like so:
If we wish to find the "best" linear relationship between the values of y_{1}, y_{2}, ... y_{K} and
x_{k1}, x_{k2}, ... x_{kn} (for k = 1 to K)
according to:
y_{1} = β_{0} + β_{1}x_{11} + β_{2}x_{12} + ... +β_{n}x_{1n} + ε_{1}
y_{2} = β_{0} + β_{1}x_{21} + β_{2}x_{22} + ... +β_{n}x_{2n} + ε_{2}
....
y_{K} = β_{0} + β_{1}x_{K1} + β_{2}x_{K2} + ... +β_{n}x_{Kn} + ε_{K}
or
or
y = Xβ + ε
where the Kvector ε denotes the errors (or residuals) in the linear approximation,
y is a Kvector, β an (n+1)vector and X a K x (n+1) matrix,
then we can minimize the size of the residuals by selecting the βvalues according to:
β = (X^{T}X)^{1}X^{T}y

>Well it doesn't look better to me!
Here's an example where K = n = 3 and β_{0} = 0 (so we're looking for
β_{1}, β_{2} and β_{3} and we ignore that first column of 1s in X):
Note the assumed values for the X matrix and the column vector y ... coloured
We run thru' the ritual, calculating X^{T} and X^{T}X etc. etc.
... and finally the β parameters ... coloured
The resultant (almost linear) relationship is inside the blue box.
There are (as expected!), errors denoted by e1, e2 and e3 ... but they're pretty small. They're coloured
If, instead of those "best" choices for the parameters, we had chosen a different set, say β'
coloured the errors are significantly greater.
>Is there a spreadsheet, so I can play?
Yes. Click here.
