B <-matrix(c(5, 6, 7, 8), nrow =2)AB <- A %*% B # matrix multiplicationt(AB) # transpose of product
[,1] [,2]
[1,] 23 34
[2,] 31 46
t(B) %*%t(A) # product of transposes
[,1] [,2]
[1,] 23 34
[2,] 31 46
E.3 Matrix Multiplication
E.3.1 Definition and Computation
The matrix product\(C = AB\) is defined when the number of columns in \(A\) equals the number of rows in \(B\). If \(A\) is an \(n \times p\) matrix and \(B\) is a \(p \times q\) matrix, then the product \(C = AB\) is an \(n \times q\) matrix.
The \((i, j)\)-entry of \(C\) is computed as the dot product of the \(i\)th row of \(A\) and the \(j\)th column of \(B\):
\[
C_{ij} = \sum_{k=1}^{p} A_{ik} B_{kj}
\]
That is, to compute the entry in the \(i\)th row and \(j\)th column of \(C\), multiply corresponding elements from the \(i\)th row of \(A\) and the \(j\)th column of \(B\), and then sum the results.
Example E.4 Let \(X\) be a \(3 \times 2\) matrix and \(\beta\) a \(2 \times 1\) vector. Then \(X\beta\) is a \(3 \times 1\) vector. \(X'X\) is \(2 \times 2\) and symmetric.
We have a couple of familiar matrix multiplications from linear regression. \(X\beta\) is especially important to us!.
X <-matrix(c(1, 1, 1, 2, 3, 4), nrow =3)beta <-c(0.5, 1)X %*% beta # linear prediction
[,1]
[1,] 2.5
[2,] 3.5
[3,] 4.5
t(X) %*% X # 2 x 2 matrix
[,1] [,2]
[1,] 3 9
[2,] 9 29
E.3.2 Rules
Associative: \((AB)C = A(BC)\)
Distributive: \(A(B + C) = AB + AC\)
Not commutative: \(AB \neq BA\) in general
Example E.5
A <-matrix(c(1, 2, 3, 4), 2) # 2 x 2B <-matrix(c(5, 6, 7, 8), 2) # 2 x 2C <-matrix(c(100, 200, 300, 400), 2) # 2 x 2# same(A %*% B) %*% C
[,1] [,2]
[1,] 8500 19300
[2,] 12600 28600
A %*% (B %*% C)
[,1] [,2]
[1,] 8500 19300
[2,] 12600 28600
# sameA %*% (B + C)
[,1] [,2]
[1,] 723 1531
[2,] 1034 2246
(A %*% B) + (A %*% C)
[,1] [,2]
[1,] 723 1531
[2,] 1034 2246
# differentA %*% B
[,1] [,2]
[1,] 23 31
[2,] 34 46
B %*% A
[,1] [,2]
[1,] 19 43
[2,] 22 50
E.4 Special Matrices
E.4.1 Identity Matrix
An identity matrix is a square matrix with 1s on the diagonal and 0s elsewhere. Denoted \(I\).
\(AI = A\)
\(IA = A\)
\(I_n\) is \(n \times n\)
I <-diag(3) # shortcut to make 3x3 identity matrixI
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
A <-matrix(1:9, nrow =3)A %*% I # same as A
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
E.4.2 Diagonal Matrix
A diagonal matrix has nonzero entries only on the diagonal. These matrices are often used for variances or weights.
d <-c(2, 4, 6)D <-diag(d) # shortcut to make diagonal matrix
E.4.3 Symmetric Matrix
A matrix \(A\) is symmetric if \(A' = A\).
\(X'X\) is always symmetric.
X <-matrix(c(1, 2, 3, 4, 5, 6), nrow =3)t(X) %*% X # symmetric
The inverse of a square matrix \(A\) is a matrix \(A^{-1}\) such that \(AA^{-1} = A^{-1}A = I\).
Not all square matrices have an inverse.
Inverse exists only if matrix is full rank.
A <-matrix(c(2, 1, 1, 1), 2)solve(A) # A-inverse
[,1] [,2]
[1,] 1 -1
[2,] -1 2
A %*%solve(A) # should be identity
[,1] [,2]
[1,] 1 0
[2,] 0 1
E.6 Full Rank
A matrix has full rank if its columns are linearly independent.
\(n \times p\) matrix has full column rank if rank = \(p\).
A square matrix is invertible if and only if full rank.
B <-matrix(c(1, 2, 2, 4), 2)qr(B)$rank # use QR decomposition to find rank
[1] 1
E.7 Why Rank Matters
If \(X\) is not full rank, \(X'X\) is not invertible.
In regression, full rank \(X\) ensures a unique \(\hat\beta\).
If rank is less than the number of columns, this means on variable is perfectly collinear with another.
E.8 Examples
Example E.8 Let \(A = \begin{bmatrix}4 & 7 \\ 2 & 6\end{bmatrix}\). Compute \(A^{-1}\).
A <-matrix(c(4, 2, 7, 6), 2)solve(A)
[,1] [,2]
[1,] 0.6 -0.7
[2,] -0.2 0.4
Example E.9 Let \(C = \begin{bmatrix}1 & 2 \\ 2 & 4\end{bmatrix}\). Is \(C\) invertible?
C <-matrix(c(1, 2, 2, 4), 2)qr(C)$rank # rank < n
[1] 1
We can see that the second column is simple 2 times the first column, so the matrix is not full rank and thus not invertible.
E.9 Matrix Calculus (Introductory)
Matrix calculus helps compute gradients and Hessians of functions involving vectors and matrices.
Used in optimization, MLE, and Bayesian computation
Especially common in linear models
E.9.1 Gradients
If \(f(x)\) is a scalar-valued function of a vector\(x\), then the gradient of \(f\) with respect to \(x\), denoted \(\frac{\partial f}{\partial x}\) or \(\nabla_x f\), is a vector containing the partial derivatives of \(f\) with respect to each component of \(x\), so that
If \(f(x)\) is a scalar-valued function of a vector\(x\), then the Hessian of \(f\) with respect to \(x\), denoted \(\frac{\partial^2 f}{\partial x \partial x'}\), is an \(n \times n\) matrix of second-order partial derivatives:
The Hessian describes the curvature of \(f(x)\) and is used in second-order optimization methods (like Newton’s method) and in statistical approximations (e.g., Laplace approximation).
Example E.12 Let \(f(x) = x_1^2 + 3x_2\), where \(x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}\).
Vectors and matrices: Represent the outcome variable \(y\), a matrix of explanatory variables \(X\) (usually including a column of ones in the first column), parameters \(\beta\), or residuals \(\epsilon\).
Matrix multiplication:
Linear predictor: \(X\beta\)
OLS and ML estimate of normal-linear model: \(\hat{\beta} = (X'X)^{-1}X'y\)
Transpose:
Quadratic loss: \((y - X\beta)'(y - X\beta)\)
Covariance formulas: \(X'\Sigma^{-1}X\)
Special matrices:
Identity matrix \(I\): appears in prior variances, regularization, ridge regression