Multiple Linear Regression

Multiple Linear Regression

We assume that some set of variables, y₁, y₂, ... y_K, is dependent upon variables x_k1, x_k2, ... x_kn (for k = 1 to K).
We assume the relationship betwen the ys and xs is "almost" linear, like so:

[1] y₁ = β₀ + β₁x₁₁ + β₂x₁₂ + ... +β_nx_1n + ε₁
y₂ = β₀ + β₁x₂₁ + β₂x₂₂ + ... +β_nx_2n + ε₂
....
y_K = β₀ + β₁x_K1 + β₂x_K2 + ... +β_nx_Kn + ε_K

>Why so many variables?
Well, suppose we note that, when the xs have the values x₁₁, x₁₂, ... x_1n, the y-value is y₁.
We suspect an almost linear relationship, so we try again, noting that x-values x₂₁, x₂₂, ... x_2n result in a y-value of y₂.
We continue, for K observations, where, for x-values x_K1, x_K2, ... x_Kn the result is a y-value of y_K.
Then in an attempt to identify the "almost" linear relationship, we assume the relationship [1].

We can write this in matrix format, like so:
[2]
or, more elegantly:
[3] y = Xβ + ε where y, β and ε are column vectors and X is a K x (n+1) matrix.

We attempt to minimize the sum of the squares of the errors (also called "residuals") by clever choice of the parameters β₀, β₁, ... β_n.
E = ε₁² + ε₂² + ...+ε_K² = ε^Tε where the row vector ε^T denotes the transpose of ε.

(Note that this sum of squares is just the square of the magnitude of the vector ε ... so we're making the vector as small as possible.)

We set all the derivatives to zero, to locate the minimum.
For each j = 0 to n we have:
∂E / ∂β_j = 2 ε₁ ∂ε₁/∂β_j + 2 ε₂ ∂ε₂/∂β_j + ... + 2 ε_K ∂ε_K/∂β_J = 2 Σ ε_k ∂ε_k/∂β_j = 0 the summation being from k = 1 to K.

Since ε_k = y_k - β₀ - β₁x_k1 - β₂x_k2 - ... - β_nx_kn (from [1]) then:
ε_k ∂ε_k/∂β_j = [y_k - β₀ - β₁x_k1 - β₂x_k2 - ... - β_nx_kn] (-x_kj) = -(x_kj)* [y_k - β₁x_k1 - β₂x_k2 - ... - β_nx_kn]
= -(the kj^th component of X)*[the kth component of y - Xβ]
= -(the jk^th component of X^T)*[the kth component of y - Xβ]

then we have n+1 equations (for j = 0 to n) like:
[4] Σε_k ∂ε_k/∂β_j = - Σ(the jk^th component of X^T)*[the kth component of y - Xβ] ... summed over k.

But [4] defines the j^th component of the n-component column vector: X^T [ y - Xβ ] .

Setting them to zero gives us n+1 such linear equations to solve for the n+1 parameters β₀, β₁, β₂ ... β_n, namely:
[5] X^T [ y - Xβ ] = 0.

>What about the xs and ys?
We know them. They're our observations and our goal is to determine the "almost" linear relationship between them.
That means finding the (n+1) β-values which we do by solving [5] for:
[6] β = (X^TX)^-1X^Ty where (X^TX)^-1 denotes the inverse of X^TX.

>Don't you find that ... uh, a little confusing?
It'll look better if we elevate it's position like so:
If we wish to find the "best" linear relationship between the values of y₁, y₂, ... y_K and x_k1, x_k2, ... x_kn (for k = 1 to K)
according to:
y₁ = β₀ + β₁x₁₁ + β₂x₁₂ + ... +β_nx_1n + ε₁
y₂ = β₀ + β₁x₂₁ + β₂x₂₂ + ... +β_nx_2n + ε₂
....
y_K = β₀ + β₁x_K1 + β₂x_K2 + ... +β_nx_Kn + ε_K
or

or
y = Xβ + ε
where the K-vector ε denotes the errors (or residuals) in the linear approximation,
y is a K-vector, β an (n+1)-vector and X a K x (n+1) matrix,
then we can minimize the size of the residuals by selecting the β-values according to:
β = (X^TX)^-1X^Ty
>Well it doesn't look better to me!
Here's an example where K = n = 3 and β₀ = 0 (so we're looking for β₁, β₂ and β₃ and we ignore that first column of 1s in X):

Note the assumed values for the X matrix and the column vector y ... coloured
We run thru' the ritual, calculating X^T and X^TX etc. etc. ... and finally the β parameters ... coloured
The resultant (almost linear) relationship is inside the blue box.
There are (as expected!), errors denoted by e1, e2 and e3 ... but they're pretty small. They're coloured
If, instead of those "best" choices for the parameters, we had chosen a different set, say β' coloured the errors are significantly greater.

>Is there a spreadsheet, so I can play?
Yes. Click here.