Linear Algebra 16 | A Quick Review

Series: Linear Algebra

Linear Algebra 16 | A Quick Review

Vectors

(1) Definitions, stacking, slicing

Defintion: a vector is an ordered list of numbers
Stacked vector (concatenation): A stack vector of b, c, and d is a= [b, c, d]
Size: the size of a vector is the number of entities in the vector
Zeros: the zeros vector is the entities
Ones: the ones vector is the entities
Unit Vector: has one entry 1 and all others 0
Sparsity: a vector is sparse if many of its entries are 0, nnz(x) is the number of entities that are non-zero

(2) Addition/subtraction, scalar multiplication

commutative: a+b = b+a
associative: (a+b)+c = a+(b+c)
zero: a+0=0+a=a
self substraction: a−a=0
multiplication: βa = (βa1,…,βan)
associative: (βγ)a = β(γa)
left distributive: (β + γ)a = βa + γa
right distributive: β(a + b) = βa + βb

(3) Inner product, norm, distance, RMS value

inner product (or dot product) of n-vectors a and b is: aTb=a1b1 +a2b2 +···+anbn (a value)
aTa=a1² +a2² +···+an² ≥0
aTa=0 if and only if a=0
commutative: aTb = bTa
associative with scalar multiplication: (γa)T b = γ(aT b)
distributive with vector addition: (a + b)Tc = aTc + bTc
bombination conclusion: (a + b)T(c + d) = aTc + aTd + bTc + bTd
Euclidean norm (norm): used to measure the size or magnitude of a vector

Norm homogeneity: ∥βx∥ = |β|∥x∥
Norm triangle inequality: ∥x + y∥ ≤ ∥x∥ + ∥y∥
Norm nonnegativity:∥x∥ ≥ 0
Norm definiteness: ∥x∥ = 0 only if x = 0
RMS value: mean-square value of n-vector

root-mean-square value (RMS value) is

Note that the RMS value is useful for comparing vectors of different dimensions

Norm of block vectors

Euclidean distance

RMS deviation: rms(a -b)

(4) Mean and standard deviation

average definition vector form:

de-meaned vector:

standard deviation

relationship between standard deviation, average, and rms

(5) Cauchy-Schwarz inequality, angle

Cauchy-Schwarz inequality: can be used to prove the triangle inequality

angle between two nonzero vectors a, b defined as

This is also,

orthogonal angle: θ = π/2 = 90°, aTb=0
aligned: θ = 0, aTb=∥a∥∥b∥
anti-aligned: θ = π = 180°, aT b = −∥a∥∥b∥
acute angle: θ ≤ π/2 = 90°, aTb≥0
obtuse angle: θ ≥ π/2 = 90°, aTb≤0

(6) Correlation coefficient

correlation coefficient (between a and b, with a ̃≠ 0,b ̃ ≠ 0)

Where,

This is also,

2. Linear Combinations

(1) Linear combination of vectors

For vectors a1,…,am and scalars β1,…,βm, then β1a1 +···+βmam is a linear combination of vectors.

(2) Linear dependence and independence

(3) Independence-dimension inequality

(4) Basis

(5) Orthonormal set of vectors

(6) Gram-Schmidt algorithm

3. Matrices

(1) Definitions, notation, block matrices, submatrices

Definitions: a matrix is a rectangular array of numbers
Size: (row dimension) × (column dimension)
Entry: elements also called entries or coefficients
Equality: two matrices are equal (denoted with “=”) if they are the same size and corresponding entries are equal
Tall matrix: if m > n
Wide matrix: if m<n
Square matrix: if m=n
Vector and matrix: we consider an n × 1 matrix to be an n-vector
Number and matrix: we consider a 1×1 matrix to be a number
Row vector: a 1 × n matrix is called a row vector
block matrices: whose entries are matrices, such as

where B, C, D, and E are matrices (called submatrices or blocks of A)

directed graph: a relation is a set of pairs of objects, labeled 1,…,n, such as

n × n matrix A is adjacency matrix of directed graph:

paths in directed graph: (A²)ij is number of paths of length 2 from j to i

zero matrix: m × n zero matrix has all entries zero
identity matrix: is square matrix with Iii = 1 and Iij = 0 for i ≠ j

sparse matrix: a matrix is sparse if most (almost all) of its entries are zer
diagonal matrix: square matrix with Aij = 0 when i ≠ j
lower triangular matrix: Aij = 0 for i < j
upper triangular matrix: Aij = 0 for i > j
transpose matrix: the transpose of an m × n matrix A is denoted AT , and defined by

Difference matrix: (n − 1) × n difference matrix is

Dirichlet energy: ∥Dx∥² is a measure of wiggliness for x a time series

(2) Addition/subtraction, scalar multiplication

we can add or subtract matrices of the same size:

scalar multiplication:

many obvious properties, e.g.,

matrix norm (Frobenius norm): for m×n matrix A, we define

Norm properties #1:∥αA∥ = |α|∥A∥
Norm properties #2:∥A + B∥ ≤ ∥A∥ + ∥B∥
Norm properties #3:∥A∥ ≥ 0
Norm properties #4:∥A∥=0 onlyif A=0
matrix distance:∥A − B∥

(3) Matrix-vector multiplication, linear equations

matrix-vector product: m × n matrix A, n-vector x, denoted y = Ax, with

Row interpretation: y = Ax can be expressed as

Column interpretation: y = Ax can be expressed as

(4) Matrix multiplication

Multiplication: can multiply m×p matrix A and p×n matrix B to get C=AB

out product: of m-vector a and n-vector b

matrix multiplication property #1: (AB)C = A(BC)
matrix multiplication property #2:A(B+C)=AB+AC
matrix multiplication property #3:(AB)T =BTAT
matrix multiplication property #4:AI = A and IA = A
matrix multiplication property #5:AB ≠ BA in general
Block matrix multiplication:

Inner Product

Gram matrix: must be invertible

composition: define,

then,

Second difference matrix:

Matrix powers: for square matrix A, A² means AA, and same for higher powers

(5) QR factorization

Gram-Schmidt:

where Rij = qiTaj for i < j and Rii = ∥q ̃i∥, so A=QR

QR factorization of A: (1)QTQ=I; (2)R is upper triangular with positive diagonal entries

(6) Left and right inverses, pseudo-inverse

Left inverse: a matrix X that satisfies XA = I is called a left inverse of A. If a left inverse exists we say that A is left-invertible
Property #1: if A has a left inverse C then the columns of A are linearly independent

Right Inverse: a matrix X that satisfies AX = I is a right inverse of A. If a right inverse exists we say that A is right-invertible.
A is right-invertible if and only if AT is left-invertible:

Thus, a matrix is right-invertible if and only if its rows are linearly independent

Pseudo-inverse of a tall matrix

Pseudo-inverse of a wide matrix

Pseudo-inverse via QR factorization

pseudo-inverse gives us a method for solving over-determined and under-determined systems of linear equations. If the columns of A are linearly independent, and the over-determined equations Ax = b have a solution, then x = A†b is it. If the rows of A are linearly independent, the under-determined equations Ax = b have a solution for any vector b, and x = A†b is a solution.

(7) Inverse, solving linear equations

invertible: if A has a left and a right inverse, they are unique and equal. A must be square

suppose A is invertible, for any b, Ax = b has the unique solution

inverse property #1

inverse property #2

inverse property #3

inverse property #4

Inverse via QR factorization

The solution of the linear equations

4. Theory (linear algebra)

(1) Columns of A are independent ⇔ Gram matrix ATA is invertible

(2) A has a left inverse ⇔ columns of A are linearly independent

(3) (square) A is invertible ⇔ columns are independent, or rows are independent

5. Least-squares

(1) Basic least squares problem (Regression)

residual: is r = Ax−b
least-squares problem: choose x to minimize ∥Ax − b∥²
objective function: ∥Ax − b∥²
least-squares approximate: xˆ called least squares approximate solution of Ax = b

(2) Solution via pseudo-inverse, QR factorization

(3) Multi-objective least squares via weighted sum

(4) Equality-constrained least squares

(5) Solution via KKT system

6. Fitting models to data

(1) Least squares data fitting

(2) Regression model

superposition property: f satisfies the superposition property if f(αx + βy) = αf(x) + βf(y), a function that satisfies superposition is called linear. The inner product function is linear

Affine functions: a function that is linear plus a constant is called affine. general form is f(x)=aTx+b,with a an n-vector and b a scalar
Affine approximation: first-order Taylor approximation of f, near point z is,

This is also,

Where,

Regression model is the affine function of x.

Jacobian Matrix:

First order Taylor approximation around z = 0

associated predictions

prediction error or residual (vector) is

(3) Validation on a test set

(4) Regularization

(5) Least squares classification

7. Computational complexity

(1) Floating-point operation (flop)

flops: basic arithmetic operations (+, −, ∗, /, . . .) are called floating point operations or flops.
running speed: current computers are around 1Gflop/sec

(2) Vector-vector operations (inner product, norm, ...)

addition, substraction: n flops
scalar multiplication: n flops
inner product: 2n − 1 ≈ 2n flops
norm: 2n − 1 ≈ 2n flops

(3) Matrix-vector multiplication, matrix-matrix multiplication

matrix addition, scalar-matrix multiplication cost mn flops
matrix-vector multiplication costs m(2n − 1) ≈ 2mn flops
matrix-matrix multiplication costs mn2p = 2mnp flops

(4) QR factorization

Total is 2n³ flops

(5) Inverse, solving linear equations

(6) Least-squares

(7) Linearly constrained least squares

(8) Backsubstitution

back substitution total is 1+3+···+(2n−1) = n² flops