A system of linear equations can be represented compactly as an augmented matrix \([A|\mathbf{b}]\), where \(A\) is the coefficient matrix and \(\mathbf{b}\) is the right-hand side vector.
Elementary Row Operations (EROs) are the three operations that preserve the solution set:
Swap: Interchange two rows \(R_i \leftrightarrow R_j\)
Scale: Multiply a row by a nonzero constant \(kR_i \to R_i\), where \(k \neq 0\)
Add: Add a multiple of one row to another \(R_i + kR_j \to R_i\)
Row Echelon Form (REF) and Reduced Row Echelon Form (RREF)
A matrix is in Row Echelon Form (REF) if:
All zero rows are at the bottom
The leading (first nonzero) entry of each nonzero row is to the right of the leading entry of the row above it
All entries below a leading entry are zero
A matrix is in Reduced Row Echelon Form (RREF) if additionally:
Each leading entry equals 1 (called a pivot)
Each pivot is the only nonzero entry in its column
Gaussian and Gauss-Jordan Elimination
Gaussian Elimination: Use EROs to reduce the augmented matrix to REF, then use back-substitution to find the solution.
Gauss-Jordan Elimination: Continue reducing from REF to RREF. The solution can be read directly without back-substitution.
Classification of Solutions
After row reduction, a system is:
Inconsistent (no solution): if a row of the form \([0 \; 0 \; \cdots \; 0 \;|\; c]\) with \(c \neq 0\) appears
Consistent with a unique solution: if every column (except the last) has a pivot
Consistent with infinitely many solutions: if consistent and there are free variables (columns without pivots)
Parametric Systems
When the system contains a parameter (e.g., \(a\)), row reduce and examine the pivots to classify solutions based on the parameter value.
Method for parametric systems: Row reduce the augmented matrix. Examine the last meaningful row, which typically takes the form \([0 \; 0 \; \cdots \; f(a) \;|\; g(a)]\):
No solution: \(f(a) = 0\) and \(g(a) \neq 0\)
Unique solution: \(f(a) \neq 0\)
Infinitely many solutions: \(f(a) = 0\) and \(g(a) = 0\)
Homogeneous Systems
A homogeneous system \(A\mathbf{x} = \mathbf{0}\) is always consistent (since \(\mathbf{x} = \mathbf{0}\) is always a solution, called the trivial solution).
If the system has more unknowns than equations (\(n > m\)), then it has infinitely many solutions (there must be free variables).
General Solution Structure
The general solution of \(A\mathbf{x} = \mathbf{b}\) can be written as:
\[ \mathbf{x} = \mathbf{x}_p + \mathbf{x}_h \]
where \(\mathbf{x}_p\) is any particular solution of \(A\mathbf{x} = \mathbf{b}\), and \(\mathbf{x}_h\) is the general solution of the homogeneous system \(A\mathbf{x} = \mathbf{0}\).
Parametric system: For what values of \(a\) does the following system have no solution, a unique solution, or infinitely many solutions?
\[ x + y + z = -1 \]
\[ x + 2y + az = 2a \]
\[ x + ay + 2z = -2 \]
Solution: Form the augmented matrix and row reduce:
\[ \begin{bmatrix} 1 & 1 & 1 & | & -1 \\ 1 & 2 & a & | & 2a \\ 1 & a & 2 & | & -2 \end{bmatrix} \]
\(R_2 - R_1 \to R_2\), \(R_3 - R_1 \to R_3\):
\[ \begin{bmatrix} 1 & 1 & 1 & | & -1 \\ 0 & 1 & a-1 & | & 2a+1 \\ 0 & a-1 & 1 & | & -1 \end{bmatrix} \]
\(R_3 - (a-1)R_2 \to R_3\):
\[ \begin{bmatrix} 1 & 1 & 1 & | & -1 \\ 0 & 1 & a-1 & | & 2a+1 \\ 0 & 0 & 1-(a-1)^2 & | & -1-(a-1)(2a+1) \end{bmatrix} \]
Simplify the last row: the pivot entry is \(1 - (a-1)^2 = -(a^2 - 2a) = -a(a-2)\) and the RHS is \(-1 - (2a^2 - a - 1) = -2a^2 + a = -a(2a-1)\).
Analysis:
If \(a \neq 0\) and \(a \neq 2\): pivot \(\neq 0\), so unique solution
If \(a = 0\): pivot \(= 0\), RHS \(= -0(2(0)-1) = 0\), so infinitely many solutions
If \(a = 2\): pivot \(= 0\), RHS \(= -2(2(2)-1) = -2(3) = -6 \neq 0\), so no solution
Key properties: In general \(AB \neq BA\). Also \((AB)^T = B^T A^T\) (order reverses).
Symmetric Matrices
A square matrix \(A\) is symmetric if \(A = A^T\).
If \(A, B\) are symmetric, then \(A + B\) is symmetric.
If \(A\) is invertible and symmetric, then \(A^{-1}\) is also symmetric.
Proof of second claim: \((A^{-1})^T = (A^T)^{-1} = A^{-1}\) since \(A^T = A\).
Matrix Inverse
A square matrix \(A\) is invertible if there exists a matrix \(A^{-1}\) such that \(AA^{-1} = A^{-1}A = I\). The inverse is unique.
Properties of inverses:
\((A^{-1})^{-1} = A\)
\((AB)^{-1} = B^{-1}A^{-1}\) (order reverses!)
\((A^T)^{-1} = (A^{-1})^T\)
\((cA)^{-1} = \frac{1}{c}A^{-1}\)
2x2 Inverse Formula
For \(A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\), if \(ad - bc \neq 0\):
\[ A^{-1} = \frac{1}{ad - bc}\begin{bmatrix} d & -b \\ -c & a \end{bmatrix} \]
Solving Matrix Equations
To solve equations like \(AXB = C\) for \(X\):
Left-multiply by \(A^{-1}\): \(XB = A^{-1}C\)
Right-multiply by \(B^{-1}\): \(X = A^{-1}CB^{-1}\)
Be careful with order! You cannot write \(X = A^{-1}B^{-1}C\) because matrix multiplication is not commutative.
Elementary Matrices
An elementary matrix is obtained by performing exactly one ERO on the identity matrix \(I\). There are three types:
Type I (Swap): \(E\) swaps rows \(i\) and \(j\) of \(I\)
Type II (Scale): \(E\) multiplies row \(i\) of \(I\) by \(k \neq 0\)
Type III (Add): \(E\) adds \(k\) times row \(j\) to row \(i\) in \(I\)
Key property: Left-multiplying \(A\) by elementary matrix \(E\) performs the corresponding ERO on \(A\). That is, \(EA\) equals the result of applying that ERO to \(A\).
Inverses of elementary matrices:
Swap: \(E^{-1} = E\) (swap again undoes it)
Scale by \(k\): \(E^{-1}\) scales by \(1/k\)
Add \(k \times R_j\) to \(R_i\): \(E^{-1}\) adds \(-k \times R_j\) to \(R_i\)
Expressing A as a Product of Elementary Matrices
If \(E_k \cdots E_2 E_1 A = I\), then:
\[ A^{-1} = E_k \cdots E_2 E_1 \]
\[ A = E_1^{-1} E_2^{-1} \cdots E_k^{-1} \]
Gauss-Jordan Method for Finding Inverses
To find \(A^{-1}\):
Form the augmented matrix \([A \;|\; I]\)
Row reduce until the left half becomes \(I\)
The right half is \(A^{-1}\): \([A \;|\; I] \to [I \;|\; A^{-1}]\)
If you cannot reduce the left half to \(I\), then \(A\) is not invertible.
Fundamental Theorem of Invertible Matrices (Part 1)
For an \(n \times n\) matrix \(A\), the following are equivalent:
\(A\) is invertible
\(A\mathbf{x} = \mathbf{b}\) has a unique solution for every \(\mathbf{b} \in \mathbb{R}^n\)
\(A\mathbf{x} = \mathbf{0}\) has only the trivial solution \(\mathbf{x} = \mathbf{0}\)
The RREF of \(A\) is \(I_n\)
\(A\) can be expressed as a product of elementary matrices
For an \(n \times n\) matrix \(A\), the minor \(M_{ij}\) is the determinant of the \((n-1)\times(n-1)\) submatrix obtained by deleting row \(i\) and column \(j\). The cofactor is \(C_{ij} = (-1)^{i+j} M_{ij}\).
The determinant can be computed by expanding along any row \(i\) or column \(j\):
\[ \det(A) = \sum_{j=1}^{n} a_{ij} C_{ij} = \sum_{i=1}^{n} a_{ij} C_{ij} \]
Tip: Choose the row or column with the most zeros for efficiency.
Properties of Determinants
Effects of EROs on determinants:
Operation
Effect on det
Swap two rows
\(\det \to -\det\)
Scale a row by \(k\)
\(\det \to k \cdot \det\)
Add multiple of one row to another
\(\det\) unchanged
Key determinant identities:
Triangular matrix: \(\det(A) = \) product of diagonal entries
\(\det(AB) = \det(A)\det(B)\)
\(\det(A^{-1}) = 1/\det(A)\)
\(\det(cA) = c^n \det(A)\) for \(n \times n\) matrix \(A\)
\(\det(A^T) = \det(A)\)
Algorithm: Computing Determinants via Row Reduction
Row reduce \(A\) to upper triangular form (REF), tracking all swaps and scalings.
Compute: \[\det(A) = (-1)^s \cdot \frac{1}{\text{product of scaling constants}} \cdot \text{product of diagonal entries of REF}\]
where \(s\) = number of row swaps performed.
\(A\) is invertible \(\iff\) \(\det(A) \neq 0\).
Trace
The trace of a square matrix \(A\) is the sum of its diagonal entries:
\[ \operatorname{tr}(A) = \sum_{i=1}^n a_{ii} \]
Properties:
\(\operatorname{tr}(A + B) = \operatorname{tr}(A) + \operatorname{tr}(B)\)
\(\operatorname{tr}(cA) = c \cdot \operatorname{tr}(A)\)
Prove: There are no \(2 \times 2\) matrices \(A, B\) such that \(AB - BA = I\).
Proof: Suppose \(AB - BA = I\). Take the trace of both sides:
\[ \operatorname{tr}(AB - BA) = \operatorname{tr}(I) \]
\[ \operatorname{tr}(AB) - \operatorname{tr}(BA) = 2 \]
But \(\operatorname{tr}(AB) = \operatorname{tr}(BA)\), so the left side is 0. This gives \(0 = 2\), a contradiction. Therefore no such matrices exist.
(This argument works for any \(n \times n\) case: \(0 = n\), contradiction.)
Linear Combination and Span
A linear combination of vectors \(\mathbf{v}_1, \ldots, \mathbf{v}_k\) is any vector of the form \(c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots + c_k\mathbf{v}_k\) where \(c_i \in \mathbb{R}\).
The span of a set of vectors is the set of all linear combinations: \(\operatorname{span}\{\mathbf{v}_1, \ldots, \mathbf{v}_k\} = \{c_1\mathbf{v}_1 + \cdots + c_k\mathbf{v}_k : c_i \in \mathbb{R}\}\).
Linear Independence
Vectors \(\mathbf{v}_1, \ldots, \mathbf{v}_k\) are linearly independent if the only solution to \(c_1\mathbf{v}_1 + \cdots + c_k\mathbf{v}_k = \mathbf{0}\) is \(c_1 = c_2 = \cdots = c_k = 0\).
Otherwise they are linearly dependent (at least one vector can be written as a linear combination of the others).
Subspaces
A nonempty subset \(W\) of \(\mathbb{R}^n\) is a subspace if:
\(\mathbf{0} \in W\) (contains the zero vector)
If \(\mathbf{u}, \mathbf{v} \in W\), then \(\mathbf{u} + \mathbf{v} \in W\) (closed under addition)
If \(\mathbf{u} \in W\) and \(c \in \mathbb{R}\), then \(c\mathbf{u} \in W\) (closed under scalar multiplication)
Examples of subspaces: Planes through the origin, lines through the origin, \(\{\mathbf{0}\}\), \(\mathbb{R}^n\) itself.
Non-subspaces: Sets not containing \(\mathbf{0}\) (e.g., \(\{(x,y): x + y = 1\}\)); sets with nonlinear constraints (e.g., \(\{(x,y): x^2 + y^2 \leq 1\}\)).
A basis for a subspace \(W\) is a set of vectors that is both:
Spanning: every vector in \(W\) can be written as a linear combination of the basis vectors
Linearly independent
The dimension of \(W\) is the number of vectors in any basis for \(W\).
The standard basis for \(\mathbb{R}^n\) is \(\{\mathbf{e}_1, \mathbf{e}_2, \ldots, \mathbf{e}_n\}\) where \(\mathbf{e}_i\) has 1 in position \(i\) and 0 elsewhere.
Row Space, Column Space, and Null Space
For an \(m \times n\) matrix \(A\):
Row space: \(\operatorname{row}(A) = \operatorname{span}\{\text{rows of } A\} \subseteq \mathbb{R}^n\)
Column space: \(\operatorname{col}(A) = \operatorname{span}\{\text{columns of } A\} \subseteq \mathbb{R}^m\)
Basis for row space: The nonzero rows of the RREF.
Basis for column space: The columns of the original matrix \(A\) corresponding to the pivot columns of the RREF.
Basis for null space: Solve \(A\mathbf{x} = \mathbf{0}\). Set each free variable = 1 (others = 0) one at a time; the resulting vectors form a basis.
Important: For the column space, use columns of the ORIGINAL matrix, not the RREF! Row reduction changes the column space but preserves the row space.
Rank and Nullity
Rank: \(\operatorname{rank}(A) = \dim(\operatorname{row}(A)) = \dim(\operatorname{col}(A)) = \) number of pivots in RREF
Nullity: \(\operatorname{nullity}(A) = \dim(\operatorname{null}(A)) = \) number of free variables
Rank-Nullity Theorem: For an \(m \times n\) matrix \(A\):
\[ \operatorname{rank}(A) + \operatorname{nullity}(A) = n \]
where \(n\) is the number of columns.
Orthogonal and Orthonormal Sets
A set of vectors \(\{\mathbf{v}_1, \ldots, \mathbf{v}_k\}\) is orthogonal if \(\mathbf{v}_i \cdot \mathbf{v}_j = 0\) for all \(i \neq j\).
It is orthonormal if additionally each vector has unit length: \(\|\mathbf{v}_i\| = 1\) for all \(i\).
An orthogonal set of nonzero vectors is linearly independent.
A matrix \(Q\) has orthonormal columns \(\iff\) \(Q^T Q = I\).
Change of Basis
Let \(B = \{\mathbf{b}_1, \ldots, \mathbf{b}_n\}\) be a basis for \(\mathbb{R}^n\), and let \(P_B = [\mathbf{b}_1 | \cdots | \mathbf{b}_n]\) be the matrix whose columns are the basis vectors.
If \([\mathbf{x}]_B\) denotes the coordinate vector of \(\mathbf{x}\) with respect to basis \(B\), then:
\[ \mathbf{x} = P_B [\mathbf{x}]_B \quad \Longrightarrow \quad [\mathbf{x}]_B = P_B^{-1} \mathbf{x} \]
To convert between two bases \(A\) and \(B\):
\[ [\mathbf{x}]_B = P_B^{-1} P_A [\mathbf{x}]_A \]
For orthogonal basis \(\{\mathbf{v}_1, \ldots, \mathbf{v}_n\}\): the coordinates are
\[ c_i = \frac{\mathbf{u} \cdot \mathbf{v}_i}{\mathbf{v}_i \cdot \mathbf{v}_i} \]
For orthonormal basis: even simpler: \(c_i = \mathbf{u} \cdot \mathbf{v}_i\)
Find the rank, nullity, and bases for all fundamental subspaces of:
\[ A = \begin{bmatrix} 1 & 2 & 0 & 1 \\ 2 & 4 & 1 & 3 \\ 3 & 6 & 1 & 4 \end{bmatrix} \]
Solution: Row reduce to RREF:
\[ R_2 - 2R_1,\; R_3 - 3R_1: \begin{bmatrix} 1 & 2 & 0 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 1 & 1 \end{bmatrix} \]
\[ R_3 - R_2: \begin{bmatrix} 1 & 2 & 0 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 \end{bmatrix} \]
Pivots are in columns 1 and 3. Free variables: \(x_2, x_4\).
Rank = 2, Nullity = 4 - 2 = 2. Check: rank + nullity = 2 + 2 = 4 = n.
Basis for row space: \(\{(1,2,0,1),\; (0,0,1,1)\}\) (nonzero rows of RREF)
Basis for column space: Columns 1 and 3 of original \(A\): \(\left\{\begin{pmatrix}1\\2\\3\end{pmatrix}, \begin{pmatrix}0\\1\\1\end{pmatrix}\right\}\)
Basis for null space: Solve \(A\mathbf{x}=\mathbf{0}\):
From RREF: \(x_1 = -2x_2 - x_4\), \(x_3 = -x_4\). Set \(x_2=1, x_4=0\): \((-2,1,0,0)\). Set \(x_2=0, x_4=1\): \((-1,0,-1,1)\).
Basis: \(\{(-2,1,0,0),\; (-1,0,-1,1)\}\)
A scalar \(\lambda\) is an eigenvalue of a square matrix \(A\) if there exists a nonzero vector \(\mathbf{v}\) such that:
\[ A\mathbf{v} = \lambda\mathbf{v} \]
The nonzero vector \(\mathbf{v}\) is called an eigenvector corresponding to \(\lambda\).
Finding Eigenvalues
Form the characteristic polynomial: \(\det(A - \lambda I) = 0\)
Solve for \(\lambda\) (the eigenvalues)
For each eigenvalue \(\lambda\), find the eigenspace \(E_\lambda = \operatorname{null}(A - \lambda I)\) by solving \((A - \lambda I)\mathbf{x} = \mathbf{0}\)
For a 2x2 matrix \(A\), the characteristic polynomial is:
\[ \lambda^2 - \operatorname{tr}(A)\lambda + \det(A) = 0 \]
Multiplicities
Algebraic multiplicity of \(\lambda\): the number of times \(\lambda\) appears as a root of the characteristic polynomial
Geometric multiplicity of \(\lambda\): \(\dim(E_\lambda) = \dim(\operatorname{null}(A - \lambda I))\)
For every eigenvalue: \(1 \leq \text{geometric multiplicity} \leq \text{algebraic multiplicity}\).
\(\det(A) = \lambda_1 \lambda_2 \cdots \lambda_n\) (product of all eigenvalues, with multiplicity)
\(\operatorname{tr}(A) = \lambda_1 + \lambda_2 + \cdots + \lambda_n\) (sum of all eigenvalues)
If \(A\mathbf{v} = \lambda\mathbf{v}\), then \(A^k\mathbf{v} = \lambda^k\mathbf{v}\)
Diagonalisation
A matrix \(A\) is diagonalisable if there exists an invertible matrix \(P\) and diagonal matrix \(D\) such that:
\[ A = PDP^{-1} \]
where \(P = [\mathbf{v}_1 | \mathbf{v}_2 | \cdots | \mathbf{v}_n]\) (columns are eigenvectors) and \(D = \operatorname{diag}(\lambda_1, \ldots, \lambda_n)\).
\(A\) is diagonalisable \(\iff\) for every eigenvalue, geometric multiplicity = algebraic multiplicity \(\iff\) \(A\) has \(n\) linearly independent eigenvectors.
Matrix Powers via Diagonalisation
If \(A = PDP^{-1}\), then:
\[ A^k = PD^kP^{-1} \]
where \(D^k = \operatorname{diag}(\lambda_1^k, \ldots, \lambda_n^k)\) is trivial to compute.
Markov Chains
A transition matrix \(T\) has columns summing to 1 (column-stochastic). The state after \(k\) steps: \(\mathbf{x}_k = T^k \mathbf{x}_0\).
The steady-state vector \(\mathbf{q}\) satisfies \(T\mathbf{q} = \mathbf{q}\) (eigenvector for \(\lambda = 1\)), normalised so entries sum to 1. Every column-stochastic matrix has \(\lambda = 1\) as an eigenvalue.
Markov Chain: A system has transition matrix \(T = \begin{bmatrix}0.7 & 0.4 \\ 0.3 & 0.6\end{bmatrix}\). Find the steady state.
Solution: Solve \((T - I)\mathbf{q} = \mathbf{0}\):
\[\begin{bmatrix}-0.3&0.4\\0.3&-0.4\end{bmatrix}\begin{pmatrix}q_1\\q_2\end{pmatrix} = \mathbf{0}\]
From the first row: \(-0.3q_1 + 0.4q_2 = 0 \implies q_1 = \frac{4}{3}q_2\).
With constraint \(q_1 + q_2 = 1\): \(\frac{4}{3}q_2 + q_2 = 1 \implies \frac{7}{3}q_2 = 1 \implies q_2 = \frac{3}{7}\), \(q_1 = \frac{4}{7}\).
Steady state: \(\mathbf{q} = \begin{pmatrix}4/7 \\ 3/7\end{pmatrix}\).
Topic 6: Linear Transformations and Projections
Definition of Linear Transformation
A function \(T: \mathbb{R}^n \to \mathbb{R}^m\) is a linear transformation if for all \(\mathbf{u}, \mathbf{v} \in \mathbb{R}^n\) and \(c \in \mathbb{R}\):
Quick test: If \(T(\mathbf{0}) \neq \mathbf{0}\), then \(T\) is NOT linear.
Transformation Matrix
Every linear transformation \(T: \mathbb{R}^n \to \mathbb{R}^m\) can be represented as matrix multiplication:
\[ T(\mathbf{x}) = A\mathbf{x} \]
where the standard matrix is:
\[ A = [T(\mathbf{e}_1) | T(\mathbf{e}_2) | \cdots | T(\mathbf{e}_n)] \]
Finding [T] from Input-Output Pairs
If you know \(T(\mathbf{u}_1) = \mathbf{w}_1, \ldots, T(\mathbf{u}_n) = \mathbf{w}_n\) where \(\{\mathbf{u}_1, \ldots, \mathbf{u}_n\}\) forms a basis, then:
\[ [T] \cdot [\mathbf{u}_1 | \cdots | \mathbf{u}_n] = [\mathbf{w}_1 | \cdots | \mathbf{w}_n] \]
\[ [T] = [\mathbf{w}_1 | \cdots | \mathbf{w}_n][\mathbf{u}_1 | \cdots | \mathbf{u}_n]^{-1} \]
Composition of Transformations
If \(T\) has matrix \(A\) and \(S\) has matrix \(B\), then the composite \(S \circ T\) has matrix \(BA\) (right to left, matching function composition order).
The projection matrix onto the line in the direction of \(\mathbf{a}\) is:
\[ P = \frac{\mathbf{a}\mathbf{a}^T}{\mathbf{a}^T\mathbf{a}} \]
The projection of \(\mathbf{v}\) onto the line: \(\operatorname{proj}_{\mathbf{a}} \mathbf{v} = P\mathbf{v} = \frac{\mathbf{a} \cdot \mathbf{v}}{\mathbf{a} \cdot \mathbf{a}}\mathbf{a}\)
The distance from \(\mathbf{v}\) to the line: \(\|\mathbf{v} - P\mathbf{v}\|\)
Projection matrices are idempotent: \(P^2 = P\). Applying the projection twice gives the same result as applying it once.
Range and Kernel
For a linear transformation \(T(\mathbf{x}) = A\mathbf{x}\):
Range (image) = \(\operatorname{col}(A)\) = set of all possible outputs
Kernel (null space) = \(\operatorname{null}(A)\) = set of all inputs mapped to \(\mathbf{0}\)
Least Squares
When \(A\mathbf{x} = \mathbf{b}\) has no exact solution (inconsistent system), the least squares solution minimises \(\|A\mathbf{x} - \mathbf{b}\|^2\) and is given by:
\[ \hat{\mathbf{x}} = (A^T A)^{-1} A^T \mathbf{b} \]
This solves the normal equations: \(A^T A \hat{\mathbf{x}} = A^T \mathbf{b}\).
Projection and distance: Find the projection of \(\mathbf{v} = \begin{pmatrix}3\\4\end{pmatrix}\) onto the line spanned by \(\mathbf{a} = \begin{pmatrix}1\\2\end{pmatrix}\), and the distance from \(\mathbf{v}\) to this line.
Solution:
\[\operatorname{proj}_{\mathbf{a}} \mathbf{v} = \frac{\mathbf{a} \cdot \mathbf{v}}{\mathbf{a} \cdot \mathbf{a}}\mathbf{a} = \frac{3(1)+4(2)}{1^2+2^2}\begin{pmatrix}1\\2\end{pmatrix} = \frac{11}{5}\begin{pmatrix}1\\2\end{pmatrix} = \begin{pmatrix}11/5\\22/5\end{pmatrix}\]
The projection matrix: \(P = \frac{1}{5}\begin{bmatrix}1&2\\2&4\end{bmatrix}\). Verify: \(P^2 = P\).
Distance:
\[\mathbf{v} - P\mathbf{v} = \begin{pmatrix}3 - 11/5\\4 - 22/5\end{pmatrix} = \begin{pmatrix}4/5\\-2/5\end{pmatrix}\]
\[\|\mathbf{v} - P\mathbf{v}\| = \sqrt{(4/5)^2 + (-2/5)^2} = \sqrt{16/25 + 4/25} = \sqrt{20/25} = \frac{2\sqrt{5}}{5}\]
Least squares regression: Fit a line \(y = c_0 + c_1 x\) to the points \((1, 2), (2, 3), (3, 6)\).
Solution: Set up the system \(A\mathbf{c} = \mathbf{b}\):
\[ A = \begin{bmatrix}1&1\\1&2\\1&3\end{bmatrix}, \quad \mathbf{b} = \begin{pmatrix}2\\3\\6\end{pmatrix} \]
Compute normal equations:
\[A^T A = \begin{bmatrix}3&6\\6&14\end{bmatrix}, \quad A^T\mathbf{b} = \begin{bmatrix}1&1&1\\1&2&3\end{bmatrix}\begin{pmatrix}2\\3\\6\end{pmatrix} = \begin{pmatrix}11\\26\end{pmatrix}\]
Solve \(A^TA\hat{\mathbf{c}} = A^T\mathbf{b}\):
\[\begin{bmatrix}3&6\\6&14\end{bmatrix}\begin{pmatrix}c_0\\c_1\end{pmatrix} = \begin{pmatrix}11\\26\end{pmatrix}\]
\(\det(A^TA) = 42 - 36 = 6\)
\[\hat{\mathbf{c}} = \frac{1}{6}\begin{bmatrix}14&-6\\-6&3\end{bmatrix}\begin{pmatrix}11\\26\end{pmatrix} = \frac{1}{6}\begin{pmatrix}154-156\\-66+78\end{pmatrix} = \frac{1}{6}\begin{pmatrix}-2\\12\end{pmatrix} = \begin{pmatrix}-1/3\\2\end{pmatrix}\]
The best-fit line is \(y = -\frac{1}{3} + 2x\).