Linear Regression 6 | Sum of Squared Errors, Sum of Squares, and Type 1,2,3 ANOVA for MLR

Series: Linear Regression

Linear Regression 6 | Sum of Squared Errors, Sum of Squares, and Type 1,2,3 ANOVA for MLR

Sum of Squared Errors for MLR

(1) The Definition of the Sum of Squared Errors (SSE)

The sum of squared error terms, which is also the residual sum of squares, is by its definition, the sum of squared residuals.

(2) Formula #1 of the Sum of Squared Errors

Proof:

By the model of MLR,

then,

By the definition of the residual,

then,

then,

then,

because H-bar is a symmetric matrix, then,

Also, because H-bar is a idempotent matrix, then,

(3) Formula #2 of the Sum of Squared Errors

Proof:

Because H is a projection matrix transfer a vector onto the column space of X, then we must have the conclusion that,

then,

this is to say that,

thus, in conclusion,

By formula #1, we can have,

By the real model of MLR,

then,

then,

then,

then,

then,

By the fact that,

then,

(4) Formula #3 of the Sum of Squared Errors

Proof:

By the definition of the SSE,

then,

By the solution of the OLS estimator,

then,

then,

then,

then,

(5) Formula #4 of the Sum of Squared Errors

Proof:

By the definition of the SSE,

then,

then,

By the solution of the OLS estimator,

then,

then,

then,

then,

then,

then,

(5) Trace of the Hat Matrix

Suppose we are given k independent (explanatory) variables, then, by the definition of the matrix X, X is going to be a n × k matrix. In order to get our OLS estimator, we have to make an assumption that the matrix is k ranked. This means we have enough observations of this model. Then,

Then, by the definition of the hat matrix, which is the projection matrix onto the column space of X, that is,

Note that H is an n × n matrix.

then,

By the commutative property of the trace (see a proof from here),

By definition of the inverse matrix,

So the trace of the hat matrix equals k.

(6) Trace of the Hat-bar Matrix

Based on our following discussion, the trace of the H-bar matrix is,

Proof:

By definition,

By property of the matrix trace,

By our discussions above,

(7) Cochran’s Theorem

Suppose we are given a vector x of i.i.d. random variables x1, x2, …, xn follow the same Gaussian distribution. Also, we are given a matrix A with a trace of p. Then we can have,