Math is not just a way of calculating numerical answers; it is a way of thinking, using clear definitions for concepts and rigorous logic to organize our thoughts and back up our assertions.
Cheng (2025)
These lecture notes use:
Some key results are listed here.
Theorem 1 (Equalities are transitive) If \(a=b\) and \(b=c\), then \(a=c\)
Theorem 2 (Substituting equivalent expressions) If \(a = b\), then for any function \(f(x)\), \(f(a) = f(b)\)
Theorem 3 If \(a<b\), then \(a+c < b+c\)
Theorem 4 (negating both sides of an inequality) If \(a < b\), then: \(-a > -b\)
Theorem 5 If \(a < b\) and \(c \geq 0\), then \(ca < cb\).
Theorem 6 \[-a = (-1)*a\]
Theorem 7 (adding zero changes nothing) \[a+0=a\]
Theorem 8 (Sums are symmetric) \[a+b = b+a\]
Theorem 9 (Sums are associative)
\[(a + b) + c = a + (b + c)\]
Theorem 10 (Multiplying by 1 changes nothing) \[a \times 1 = a\]
Theorem 11 (Products are symmetric) \[a \times b = b \times a\]
Theorem 12 (Products are associative) \[(a \times b) \times c = a \times (b \times c)\]
Theorem 13 (Division can be written as a product) \[\frac {a}{b} = a \times \frac{1}{b}\]
Theorem 14 (Multiplication is distributive) \[a(b+c) = ab + ac\]
Definition 1 (Quotients, fractions, rates)
\[\frac{a}{b}\]
Definition 2 (Ratios) A ratio is a quotient in which the numerator and denominator are measured using the same unit scales.
Definition 3 (Proportion) In statistics, a “proportion” typically means a ratio where the numerator represents a subset of the denominator.
Definition 4 (Proportional) Two functions \(f(x)\) and \(g(x)\) are proportional if their ratio \(\frac{f(x)}{g(x)}\) does not depend on \(x\). (c.f. https://en.wikipedia.org/wiki/Proportionality_(mathematics))
Additional reference for elementary algebra: https://en.wikipedia.org/wiki/Population_proportion#Mathematical_definition
Theorem 15 (logarithm of a product is the sum of the logs of the factors) \[ \log{a\cdot b} = \log{a} + \log{b} \]
Corollary 1 (logarithm of a quotient)
\[\log{\frac{a}{b}} = \log{a} - \log{b}\]
Theorem 16 (logarithm of an exponential function) \[ \text{log}{\left\{a^b\right\}} = b \cdot\text{log}{\left\{a\right\}} \]
Theorem 17 (exponential of a sum)
\[\text{exp}{\left\{a+b\right\}} = \text{exp}{\left\{a\right\}} \cdot\text{exp}{\left\{b\right\}}\]
Corollary 2 (exponential of a difference)
\[\text{exp}{\left\{a-b\right\}} = \frac{\text{exp}{\left\{a\right\}}}{\text{exp}{\left\{b\right\}}}\]
Theorem 18 (exponential of a product) \[a^{bc} = {\left(a^b\right)}^c = {\left(a^c\right)}^b\]
Corollary 3 (natural exponential of a product) \[\text{exp}{\left\{ab\right\}} = (\text{exp}{\left\{a\right\}})^b = (\text{exp}{\left\{b\right\}})^a\]
Exercise 1 For \(a \ge 0,~b,c \in \mathbb{R}\), When does \((a^b)^c = a^{(b^c)}\)?
Solution 1. Short answer: rarely (that’s all you need to know for this course).
Long answer:
If \((a^b)^c = a^{(b^c)}\), then since \((a^b)^c = a^{bc}\), we have: \[a^{bc} = a^{(b^c)}\] \[\text{log}{\left\{a^{bc}\right\}} = \text{log}{\left\{a^{(b^c)}\right\}}\] \[bc \cdot \text{log}{\left\{a\right\}} = b^c\cdot \text{log}{\left\{a\right\}} \tag{1}\]
Equation 1 holds in each of the following cases:
In particular, when \(a=0\) and \(c=0\), \(bc = 0\) and \(b^c = 1\) (for any \(b \in \mathbb{R}\)), so \(\text{sign}{\left\{bc\right\}}\neq \text{sign}{\left\{b^c\right\}}\), and \((a^b)^c \neq a^{(b^c)}\):
\[ \begin{aligned} (a^b)^c &= (0^b)^0 \\ &= 1 \end{aligned} \]
\[ \begin{aligned} a^{(b^c)} &= 0^{(b^0)} \\ &= 0^1 \\ &= 0 \end{aligned} \]
Exercise 2 For \(b,c \in \mathbb{R}\), when does \(b^c = bc\)?
Solution 2. \(bc = b^c\) in each of the following cases:
See the red contours in Figure 2 for a visualization.
`b*c_f` <- function(b, c) b*c
`b^c_f` <- function(b, c) b^c
values_b <- seq(0, 5, by = .01)
values_c <- seq(-.5, 3, by = .01)
`b*c` <- outer(values_b, values_c, `b*c_f`)
`b^c` <- outer(values_b, values_c, `b^c_f`)
`b^c`[is.infinite(`b^c`)] = NA
opacity <- .3
z_min <- min(`b*c`, `b^c`, na.rm = TRUE)
z_max <- 5
plotly::plot_ly(
x = ~values_b,
y = ~values_c
) |>
plotly::add_surface(
z = ~ t(`b*c`),
contours = list(
z = list(
show = TRUE,
start = -1,
end = 1,
size = .1
)
),
name = "b*c",
showscale = FALSE,
opacity = opacity,
colorscale = list(c(0, 1), c("green", "green"))
) |>
plotly::add_surface(
opacity = opacity,
colorscale = list(c(0, 1), c("red", "red")),
z = ~ t(`b^c`),
contours = list(
z = list(
show = TRUE,
start = z_min,
end = z_max,
size = .2
)
),
showscale = FALSE,
name = "b^c"
) |>
plotly::layout(
scene = list(
xaxis = list(
# type = "log",
title = "b"
),
yaxis = list(
# type = "log",
title = "c"
),
zaxis = list(
# type = "log",
range = c(z_min, z_max),
title = "outcome"
),
camera = list(eye = list(x = -1.25, y = -1.25, z = 0.5)),
aspectratio = list(x = .9, y = .8, z = 0.7)
)
)`b^c - b*c_f` <- function(b, c) `b^c_f`(b,c) - `b*c_f`(b,c)
mat1 <- outer(values_b, values_c, `b^c - b*c_f`)
mat1[is.infinite(mat1)] = NA
opacity <- .3
plotly::plot_ly(
x = ~values_b,
y = ~values_c
) |>
plotly::add_surface(
z = ~ t(mat1),
contours = list(
z = list(
show = TRUE,
start = 0,
end = 1,
size = 1,
color = "red"
)
),
name = "b^c - b*c",
showscale = TRUE,
opacity = opacity
) |>
plotly::layout(
scene = list(
xaxis = list(
# type = "log",
title = "b"
),
yaxis = list(
# type = "log",
title = "c"
),
zaxis = list(
title = "outcome"
),
camera = list(eye = list(x = -1.25, y = -1.25, z = 0.5)),
aspectratio = list(x = .9, y = .8, z = 0.7)
)
)Theorem 19 (\(\text{exp}{\left\{\right\}}\) and \(\text{log}{\left\{\right\}}\) are mutual inverses) \[\text{exp}{\left\{\text{log}{\left\{a\right\}}\right\}} = \text{log}{\left\{\text{exp}{\left\{a\right\}}\right\}} = a\]
Theorem 20 (Constant rule) \[\frac{\partial}{\partial x}c = 0\]
Theorem 21 (Power rule) If \(a\) is constant with respect to \(x\), then: \[\frac{\partial}{\partial x}ay = a \frac{\partial x}{\partial y}\]
Theorem 22 (Power rule) \[\frac{\partial}{\partial x}x^q = qx^{q-1}\]
Theorem 23 (Derivative of natural logarithm) \[\text{log}'{\left\{x\right\}} = \frac{1}{x} = x^{-1}\]
Theorem 24 (derivative of exponential) \[\text{exp}'{\left\{x\right\}} = \text{exp}{\left\{x\right\}}\]
Theorem 25 (Product rule) \[(ab)' = ab' + ba'\]
Theorem 26 (Quotient rule) \[(a/b)' = a'/b - (a/b^2)b'\]
Theorem 27 (Chain rule) \[\begin{aligned} \frac{\partial a}{\partial c} &= \frac{\partial a}{\partial b} \frac{\partial b}{\partial c} \\ &= \frac{\partial b}{\partial c} \frac{\partial a}{\partial b} \end{aligned} \]
or in Euler/Lagrange notation:
\[(f(g(x)))' = g'(x) f'(g(x))\]
Corollary 4 (Chain rule for logarithms) \[ \frac{\partial}{\partial x}\log{f(x)} = \frac{f'(x)}{f(x)} \]
Proof. Apply Theorem 27 and Theorem 23.
Definition 5 (Column vector) A column vector of length \(p\) is an ordered list of \(p\) numbers, written vertically:
\[ \tilde{x}= \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{p} \end{bmatrix} \]
Definition 6 (Transpose) The transpose of a column vector \(\tilde{x}\) is the row vector with the same sequence of entries, written horizontally:
\[ {\tilde{x}}^{\top} \equiv \tilde{x}' \equiv [x_1,\; x_2,\; \ldots,\; x_p] \]
Definition 7 (Zero vector) The zero vector \(\tilde{0}\) of length \(p\) has all entries equal to zero:
\[ \tilde{0}= \begin{bmatrix} 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix} \]
Definition 8 (Ones vector) The ones vector \(\tilde{1}\) of length \(p\) has all entries equal to one:
\[ \tilde{1} = \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix} \]
Definition 9 (Indicator vector / standard basis vector) The \(j\)-th indicator vector (or standard basis vector) \(\tilde{e}_j\) of length \(p\) has a \(1\) in position \(j\) and \(0\)s elsewhere:
\[ (\tilde{e}_j)_i = \begin{cases} 1 & \text{if } i = j \\ 0 & \text{if } i \neq j \end{cases} \qquad \tilde{e}_j = \begin{bmatrix} 0 \\ \vdots \\ 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix} \leftarrow \text{position } j \]
Theorem 28 (Indicator vectors select entries) For any vector \(\tilde{x}\) of length \(p\) and any \(j \in \{1, \ldots, p\}\):
\[{\tilde{e}_j}^{\top}\tilde{x}= x_j\]
Proof. Writing the product componentwise:
\[ \begin{aligned} {\tilde{e}_j}^{\top}\tilde{x} &= \sum_{i=1}^{p} (\tilde{e}_j)_i\, x_i \\&= \sum_{i=1}^{p} \begin{cases} 1 \cdot x_i & \text{if } i = j \\ 0 \cdot x_i & \text{if } i \neq j \end{cases} \\&= x_j \end{aligned} \]
Definition 10 (Dot product/linear combination/inner product) For any two real-valued vectors \(\tilde{x}= (x_1, \ldots, x_n)\) and \(\tilde{y}= (y_1, \ldots, y_n)\), the dot-product, linear combination, or inner product of \(\tilde{x}\) and \(\tilde{y}\) is:
\[\tilde{x}\cdot \tilde{y}= \tilde{x}^{\top} \tilde{y}\stackrel{\text{def}}{=}\sum_{i=1}^nx_i y_i\]
Theorem 29 (Dot product is symmetric) The dot product is symmetric:
\[\tilde{x}\cdot \tilde{y}= \tilde{y}\cdot \tilde{x}\]
Proof. Apply:
Example 1 (Dot product as matrix multiplication) The dot product of two column vectors \(\tilde{x}\) and \(\tilde{\beta}\) can be written as a matrix product of the row vector \({\tilde{x}}^{\top}\) with the column vector \(\tilde{\beta}\):
\[ \begin{aligned} \tilde{x}\cdot \tilde{\beta} &= {\tilde{x}}^{\top}\, \tilde{\beta} \\ &= [x_1,\; x_2,\; \ldots,\; x_p] \begin{bmatrix} \beta_{1} \\ \beta_{2} \\ \vdots \\ \beta_{p} \end{bmatrix} \\ &= x_1\beta_1 + x_2\beta_2 + \cdots + x_p \beta_p \end{aligned} \]
Definition 11 (Orthogonal vectors) Two vectors \(\tilde{x}\) and \(\tilde{y}\) of the same length are orthogonal (written \(\tilde{x}\perp \tilde{y}\)) if their dot product is zero:
\[\tilde{x}\perp \tilde{y}\iff {\tilde{x}}^{\top}\tilde{y}= 0\]
Definition 12 (Orthonormal vectors) A set of vectors \(\{\tilde{x}_1, \tilde{x}_2, \ldots, \tilde{x}_k\}\) is orthonormal if the vectors are mutually orthogonal and each has unit length:
\[{\tilde{x}_i}^{\top}\tilde{x}_j = \begin{cases} 1 & \text{if } i = j \\ 0 & \text{if } i \neq j \end{cases}\]
Definition 13 (Matrix) A matrix of dimensions \(m \times n\) is a rectangular array of \(m \cdot n\) numbers, arranged in \(m\) rows and \(n\) columns:
\[ \mathbf{A} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix} \]
Definition 14 (Matrix transpose) The transpose of an \(m \times n\) matrix \(\mathbf{A}\) is the \(n \times m\) matrix \({\mathbf{A}}^{\top}\) obtained by swapping the rows and columns of \(\mathbf{A}\):
\[({\mathbf{A}}^{\top})_{ij} = a_{ji}\]
Theorem 30 (Transpose of a sum) \[{(\mathbf{A} + \mathbf{B})}^{\top} = {\mathbf{A}}^{\top} + {\mathbf{B}}^{\top}\]
In particular, for column vectors \(\tilde{x}\) and \(\tilde{y}\):
\[{(\tilde{x}+ \tilde{y})}^{\top} = {\tilde{x}}^{\top} + {\tilde{y}}^{\top}\]
Theorem 31 (Transpose of a product) For compatible matrices \(\mathbf{A}\) and \(\mathbf{B}\):
\[{(\mathbf{A}\mathbf{B})}^{\top} = {\mathbf{B}}^{\top}\,{\mathbf{A}}^{\top}\]
Definition 15 (Zero matrix) The \(m \times n\) zero matrix \(\mathbf{0}_{m \times n}\) (or \(\mathbf{0}\) when dimensions are clear from context) has all entries equal to zero:
\[ \mathbf{0}_{m \times n} = \begin{bmatrix} 0 & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 0 \end{bmatrix} \]
Definition 16 (Matrix addition) Two matrices \(\mathbf{A}\) and \(\mathbf{B}\) of the same dimensions \(m \times n\) can be added element-wise:
\[(\mathbf{A} + \mathbf{B})_{ij} = a_{ij} + b_{ij}\]
Theorem 32 (Matrix addition is commutative) \[\mathbf{A} + \mathbf{B} = \mathbf{B} + \mathbf{A}\]
Theorem 33 (Matrix addition is associative) \[(\mathbf{A} + \mathbf{B}) + \mathbf{C} = \mathbf{A} + (\mathbf{B} + \mathbf{C})\]
Theorem 34 (Zero matrix is the additive identity) \[\mathbf{A} + \mathbf{0} = \mathbf{A}\]
Theorem 35 (Additive inverse) For any matrix \(\mathbf{A}\), the matrix \(-\mathbf{A}\) (defined by \((-\mathbf{A})_{ij} = -a_{ij}\)) satisfies:
\[\mathbf{A} + (-\mathbf{A}) = \mathbf{0}\]
Definition 17 (Scalar multiplication) A matrix \(\mathbf{A}\) can be multiplied by a scalar \(c\):
\[(c\mathbf{A})_{ij} = c \cdot a_{ij}\]
Definition 18 (Matrix multiplication) The product of an \(m \times k\) matrix \(\mathbf{A}\) and a \(k \times n\) matrix \(\mathbf{B}\) is the \(m \times n\) matrix \(\mathbf{C} = \mathbf{A}\mathbf{B}\) with entries:
\[c_{ij} = \sum_{s=1}^{k} a_{is}\, b_{sj}\]
Theorem 36 (Matrix multiplication is associative) \[(\mathbf{A}\mathbf{B})\mathbf{C} = \mathbf{A}(\mathbf{B}\mathbf{C})\]
Theorem 37 (Matrix multiplication is distributive over addition) \[\mathbf{A}(\mathbf{B} + \mathbf{C}) = \mathbf{A}\mathbf{B} + \mathbf{A}\mathbf{C}\]
\[(\mathbf{A} + \mathbf{B})\mathbf{C} = \mathbf{A}\mathbf{C} + \mathbf{B}\mathbf{C}\]
Definition 19 (Matrix-vector multiplication) The product of an \(m \times p\) matrix \(\mathbf{A}\) and a \(p \times 1\) column vector \(\tilde{x}\) is the \(m \times 1\) column vector \(\mathbf{A}\tilde{x}\) with entries:
\[(\mathbf{A}\tilde{x})_i = \sum_{j=1}^{p} a_{ij}\, x_j\]
Definition 20 (Square matrix) A matrix is square if it has the same number of rows as columns. The number of rows (= columns) is the order of the matrix.
Definition 21 (Matrix power) For a square matrix \(\mathbf{A}\) of order \(p\) and a positive integer \(k\), the \(k\)-th power of \(\mathbf{A}\) is:
\[\mathbf{A}^k = \underbrace{\mathbf{A}\,\mathbf{A}\cdots\mathbf{A}}_{k \text{ copies}}\]
In particular, \(\mathbf{A}^2 = \mathbf{A}\mathbf{A}\).
Definition 22 (Identity matrix) The \(p \times p\) identity matrix \(\mathbf{I}_p\) (or \(\mathbf{I}\) when the size is clear from context) has ones on the main diagonal and zeros elsewhere:
\[ (\mathbf{I}_p)_{ij} = \begin{cases} 1 & \text{if } i = j \\ 0 & \text{if } i \neq j \end{cases} \qquad \mathbf{I}_p = \begin{bmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{bmatrix} \]
Theorem 38 (Identity matrix is a multiplicative identity) For any \(m \times p\) matrix \(\mathbf{A}\):
\[\mathbf{A}\,\mathbf{I}_p = \mathbf{A}\]
\[\mathbf{I}_m\,\mathbf{A} = \mathbf{A}\]
Definition 23 (Symmetric matrix) A square matrix \(\mathbf{A}\) is symmetric if \({\mathbf{A}}^{\top} = \mathbf{A}\), i.e., \(a_{ij} = a_{ji}\) for all \(i\) and \(j\).
Definition 24 (Diagonal matrix) A square matrix \(\mathbf{D}\) is a diagonal matrix if all off-diagonal entries are zero: \(d_{ij} = 0\) whenever \(i \neq j\):
\[ \mathbf{D} = \begin{bmatrix} d_1 & 0 & \cdots & 0 \\ 0 & d_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & d_p \end{bmatrix} \]
Definition 25 (Matrix inverse) For a square \(p \times p\) matrix \(\mathbf{A}\), the inverse \(\mathbf{A}^{-1}\) (if it exists) is the unique matrix satisfying:
\[\mathbf{A}\,\mathbf{A}^{-1} = \mathbf{A}^{-1}\,\mathbf{A} = \mathbf{I}_p\]
Theorem 39 (Inverse of a product) For invertible matrices \(\mathbf{A}\) and \(\mathbf{B}\):
\[(\mathbf{A}\mathbf{B})^{-1} = \mathbf{B}^{-1}\mathbf{A}^{-1}\]
Definition 26 (Idempotent matrix) A square matrix \(\mathbf{A}\) is idempotent if
\[\mathbf{A}^2 = \mathbf{A}\]
Definition 27 (Projection matrix) A square matrix \(\mathbf{P}\) is a projection matrix (also called an orthogonal projector) if it is both symmetric and idempotent:
\[{\mathbf{P}}^{\top} = \mathbf{P} \qquad \text{and} \qquad \mathbf{P}^2 = \mathbf{P}\]
Theorem 40 (Complement of a projection matrix) If \(\mathbf{P}\) is a projection matrix, then \(\mathbf{I} - \mathbf{P}\) is also a projection matrix.
Proof. We verify symmetry and idempotency.
Symmetry: \[{(\mathbf{I} - \mathbf{P})}^{\top} = {\mathbf{I}}^{\top} - {\mathbf{P}}^{\top} = \mathbf{I} - \mathbf{P}\]
Idempotency: \[\begin{aligned} (\mathbf{I} - \mathbf{P})^2 &= (\mathbf{I} - \mathbf{P})(\mathbf{I} - \mathbf{P}) \\ &= \mathbf{I} - \mathbf{P} - \mathbf{P} + \mathbf{P}^2 \\ &= \mathbf{I} - \mathbf{P} - \mathbf{P} + \mathbf{P} \\ &= \mathbf{I} - \mathbf{P} \end{aligned}\]
Theorem 41 (Hat matrix is a projection matrix) In a linear regression model with full-rank design matrix \(\mathbf{X}\), the hat matrix
\[\mathbf{H} = \mathbf{X}({\mathbf{X}}^{\top}\mathbf{X})^{-1}{\mathbf{X}}^{\top}\]
is a projection matrix.
Proof. We verify symmetry and idempotency.
Symmetry: \[\begin{aligned} {\mathbf{H}}^{\top} &= {\left(\mathbf{X}({\mathbf{X}}^{\top}\mathbf{X})^{-1}{\mathbf{X}}^{\top}\right)}^{\top} \\ &= {({\mathbf{X}}^{\top})}^{\top} \cdot {\left(({\mathbf{X}}^{\top}\mathbf{X})^{-1}\right)}^{\top} \cdot {\mathbf{X}}^{\top} \\ &= \mathbf{X}\cdot ({\mathbf{X}}^{\top}\mathbf{X})^{-1} \cdot {\mathbf{X}}^{\top} \\ &= \mathbf{H} \end{aligned}\]
where the third line uses \({({\mathbf{X}}^{\top})}^{\top} = \mathbf{X}\) and the fact that \({\mathbf{X}}^{\top}\mathbf{X}\) is symmetric, so its inverse is also symmetric (\({\left(({\mathbf{X}}^{\top}\mathbf{X})^{-1}\right)}^{\top} = ({\mathbf{X}}^{\top}\mathbf{X})^{-1}\)).
Idempotency: \[\begin{aligned} \mathbf{H}^2 &= \mathbf{X}({\mathbf{X}}^{\top}\mathbf{X})^{-1}{\mathbf{X}}^{\top} \cdot \mathbf{X}({\mathbf{X}}^{\top}\mathbf{X})^{-1}{\mathbf{X}}^{\top} \\ &= \mathbf{X}({\mathbf{X}}^{\top}\mathbf{X})^{-1}({\mathbf{X}}^{\top}\mathbf{X})({\mathbf{X}}^{\top}\mathbf{X})^{-1}{\mathbf{X}}^{\top} \\ &= \mathbf{X}({\mathbf{X}}^{\top}\mathbf{X})^{-1}{\mathbf{X}}^{\top} \\ &= \mathbf{H} \end{aligned}\]
Theorem 42 (Projection matrices produce orthogonal decompositions) If \(\mathbf{P}\) is a projection matrix and \(\tilde{v}\) is any vector of compatible dimension, then the two components of the decomposition
\[\tilde{v} = \underbrace{\mathbf{P}\tilde{v}}_{\text{projected}} + \underbrace{(\mathbf{I} - \mathbf{P})\tilde{v}}_{\text{residual}}\]
are orthogonal:
\[\mathbf{P}\tilde{v} \;\perp\; (\mathbf{I} - \mathbf{P})\tilde{v}\]
Proof. \[\begin{aligned} {(\mathbf{P}\tilde{v})}^{\top}\,(\mathbf{I} - \mathbf{P})\tilde{v} &= {\tilde{v}}^{\top}\,{\mathbf{P}}^{\top}\,(\mathbf{I} - \mathbf{P})\tilde{v} \\ &= {\tilde{v}}^{\top}\,\mathbf{P}\,(\mathbf{I} - \mathbf{P})\tilde{v} \\ &= {\tilde{v}}^{\top}\,(\mathbf{P} - \mathbf{P}^2)\tilde{v} \\ &= {\tilde{v}}^{\top}\,(\mathbf{P} - \mathbf{P})\tilde{v} \\ &= {\tilde{v}}^{\top}\,\mathbf{0}\,\tilde{v} \\ &= 0 \end{aligned}\]
where the second line uses symmetry (\({\mathbf{P}}^{\top} = \mathbf{P}\)) and the fourth line uses idempotency (\(\mathbf{P}^2 = \mathbf{P}\)).
Definition 28 (Quadratic form) A quadratic form is a mathematical expression of the structure
\[{\tilde{x}}^{\top}\, \mathbf{S}\, \tilde{x}\]
where \(\tilde{x}\) is a \(p \times 1\) vector and \(\mathbf{S}\) is a \(p \times p\) matrix.
Theorem 43 (Symmetric part of a quadratic form) If \(\mathbf{S}\) is a square matrix, then
\[ {\tilde{x}}^{\top}\mathbf{S}\tilde{x} = {\tilde{x}}^{\top}\left(\frac{1}{2}(\mathbf{S}+{\mathbf{S}}^{\top})\right)\tilde{x}. \]
So the value of a quadratic form depends only on the symmetric part of \(\mathbf{S}\).
Definition 29 (Design matrix) In a regression model with \(n\) observations and \(p\) predictors, the design matrix (or model matrix) \(\mathbf{X}\) is the \(n \times p\) matrix whose \(i\)-th row is the covariate vector \({\tilde{x}_i}^{\top}\) for observation \(i\):
\[ \mathbf{X}= \begin{bmatrix} {\tilde{x}_1}^{\top} \\ {\tilde{x}_2}^{\top} \\ \vdots \\ {\tilde{x}_n}^{\top} \end{bmatrix} = \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p} \\ x_{21} & x_{22} & \cdots & x_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix} \]
(adapted from Fieller (2016), §7.2)
Let \(\tilde{x}\) and \(\tilde{\beta}\) be column vectors of length \(p\) (see Definition 5 and Definition 10).
Definition 30 (Vector derivative) If \(f(\tilde{\beta})\) is a function that takes a vector \(\tilde{\beta}\) as input, such as \(f(\tilde{\beta}) = x'\tilde{\beta}\), then:
\[ \frac{\partial}{\partial \tilde{\beta}} f(\tilde{\beta}) = \begin{bmatrix} \frac{\partial}{\partial \beta_1}f(\tilde{\beta}) \\ \frac{\partial}{\partial \beta_2}f(\tilde{\beta}) \\ \vdots \\ \frac{\partial}{\partial \beta_p}f(\tilde{\beta}) \end{bmatrix} \]
Definition 31 (Row-vector derivative) If \(f(\tilde{\beta})\) is a function that takes a vector \(\tilde{\beta}\) as input, such as \(f(\tilde{\beta}) = x'\tilde{\beta}\), then:
\[ \frac{\partial}{\partial \tilde{\beta}^{\top}} f(\tilde{\beta}) = \begin{bmatrix} \frac{\partial}{\partial \beta_1}f(\tilde{\beta}) & \frac{\partial}{\partial \beta_2}f(\tilde{\beta}) & \cdots & \frac{\partial}{\partial \beta_p}f(\tilde{\beta}) \end{bmatrix} \]
Theorem 44 (Row and column derivatives are transposes) \[\frac{\partial}{\partial \tilde{\beta}^{\top}} f(\tilde{\beta}) = {\left(\frac{\partial}{\partial \tilde{\beta}} f(\tilde{\beta})\right)}^{\top}\]
\[\frac{\partial}{\partial \tilde{\beta}} f(\tilde{\beta}) = {\left(\frac{\partial}{\partial \tilde{\beta}^{\top}} f(\tilde{\beta})\right)}^{\top}\]
Theorem 45 (Derivative of a dot product) \[ \frac{\partial}{\partial \tilde{\beta}} \tilde{x}\cdot \tilde{\beta}= \frac{\partial}{\partial \tilde{\beta}} \tilde{\beta}\cdot \tilde{x}= \tilde{x} \]
Proof. \[ \begin{aligned} \frac{\partial}{\partial \beta} (x^{\top}\beta) &= \begin{bmatrix} \frac{\partial}{\partial \beta_1}(x_1\beta_1+x_2\beta_2 +...+x_p \beta_p ) \\ \frac{\partial}{\partial \beta_2}(x_1\beta_1+x_2\beta_2 +...+x_p \beta_p ) \\ \vdots \\ \frac{\partial}{\partial \beta_p}(x_1\beta_1+x_2\beta_2 +...+x_p \beta_p ) \end{bmatrix} \\ &= \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{p} \end{bmatrix} \\ &= \tilde{x} \end{aligned} \]
Theorem 46 (Derivative of a quadratic form) For a quadratic form (Definition 28), if \(S\) is a \(p\times p\) matrix that is constant with respect to \(\beta\), then:
\[ \frac{\partial}{\partial \beta} \beta'S\beta = 2S\beta \]
Corollary 5 (Derivative of a simple quadratic form) \[ \frac{\partial}{\partial \tilde{\beta}} \tilde{\beta}'\tilde{\beta}= 2\tilde{\beta} \]
Theorem 47 (Vector chain rule) \[\frac{\partial z}{\partial \tilde{x}} = \frac{\partial y}{\partial \tilde{x}} \frac{\partial z}{\partial y}\]
or in Euler/Lagrange notation:
\[(f(g(\tilde{x})))' = \tilde{g}'(\tilde{x}) f(g(\tilde{x}))\]
Corollary 6 (Vector chain rule for quadratic forms) \[\frac{\partial}{\partial \tilde{\beta}}{{\left(\tilde{\varepsilon}(\tilde{\beta})\cdot \tilde{\varepsilon}(\tilde{\beta})\right)}} = {\left(\frac{\partial}{\partial \tilde{\beta}}\tilde{\varepsilon}(\tilde{\beta})\right)} {\left(2 \tilde{\varepsilon}(\tilde{\beta})\right)}\]