Probability density function
Definition C.9 (probability density) If \(X\) is a continuous random variable, then the probability density of \(X\) at value \(x\), denoted \(f(x)\), \(f_X(x)\), \(\text{p}(x)\), \(\text{p}_X(x)\), or \(\text{p}(X=x)\), is defined as the limit of the probability (mass) that \(X\) is in an interval around \(x\), divided by the width of that interval, as that width reduces to 0.
\[
\begin{aligned}
f(x) &\stackrel{\text{def}}{=}\lim_{\delta \rightarrow 0}
\frac{\text{P}(X \in [x, x + \delta])}{\delta}
\end{aligned}
\]
Theorem C.8 (Density function is derivative of CDF) The density function \(f(t)\) or \(\text{p}(T=t)\) for a random variable \(T\) at value \(t\) is equal to the derivative of the cumulative probability function \(F(t) \stackrel{\text{def}}{=}P(T\le t)\); that is:
\[f(t) \stackrel{\text{def}}{=}\frac{\partial}{\partial t} F(t)\]
Theorem C.9 (Density functions integrate to 1) For any density function \(f(x)\),
\[\int_{x \in \mathcal{R}(X)} f(x) dx = 1\]
Expectation
Definition C.11 (Expectation, expected value, population mean ) The expectation, expected value, or population mean of a continuous random variable \(X\), denoted \(\mathbb{E}\left[X\right]\), \(\mu(X)\), or \(\mu_X\), is the weighted mean of \(X\)’s possible values, weighted by the probability density function of those values:
\[\mathbb{E}\left[X\right] = \int_{x\in \mathcal{R}(X)} x \cdot \text{p}(X=x)dx\]
The expectation, expected value, or population mean of a discrete random variable \(X\), denoted \(\mathbb{E}\left[X\right]\), \(\mu(X)\), or \(\mu_X\), is the mean of \(X\)’s possible values, weighted by the probability mass function of those values:
\[\mathbb{E}\left[X\right] = \sum_{x \in \mathcal{R}(X)} x \cdot \text{P}(X=x)\]
(c.f. https://en.wikipedia.org/wiki/Expected_value)
Theorem C.10 (Expectation of the Bernoulli distribution) The expectation of a Bernoulli random variable with parameter \(\pi\) is:
\[\mathbb{E}\left[X\right] = \pi\]
Proof. \[
\begin{aligned}
\mathbb{E}\left[X\right]
&= \sum_{x\in \mathcal{R}(X)} x \cdot\text{P}(X=x)
\\&= \sum_{x\in \left\{0,1\right\}} x \cdot\text{P}(X=x)
\\&= \left(0 \cdot\text{P}(X=0)\right) + \left(1 \cdot\text{P}(X=1)\right)
\\&= \left(0 \cdot(1-\pi)\right) + \left(1 \cdot\pi\right)
\\&= 0 + \pi
\\&= \pi
\end{aligned}
\]
Variance and related characteristics
Definition C.12 (Variance) The variance of a random variable \(X\) is the expectation of the squared difference between \(X\) and \(\mathbb{E}\left[X\right]\); that is:
\[
\text{Var}\left(X\right) \stackrel{\text{def}}{=}\mathbb{E}\left[(X-\mathbb{E}\left[X\right])^2\right]
\]
Theorem C.11 (Simplified expression for variance) \[\text{Var}\left(X\right)=\mathbb{E}\left[X^2\right] - \left(\mathbb{E}\left[X\right]\right)^2\]
Proof. By linearity of expectation, we have:
\[
\begin{aligned}
\text{Var}\left(X\right)
&\stackrel{\text{def}}{=}\mathbb{E}\left[(X-\mathbb{E}\left[X\right])^2\right]\\
&=\mathbb{E}\left[X^2 - 2X\mathbb{E}\left[X\right] + \left(\mathbb{E}\left[X\right]\right)^2\right]\\
&=\mathbb{E}\left[X^2\right] - \mathbb{E}\left[2X\mathbb{E}\left[X\right]\right] + \mathbb{E}\left[\left(\mathbb{E}\left[X\right]\right)^2\right]\\
&=\mathbb{E}\left[X^2\right] - 2\mathbb{E}\left[X\right]\mathbb{E}\left[X\right] + \left(\mathbb{E}\left[X\right]\right)^2\\
&=\mathbb{E}\left[X^2\right] - \left(\mathbb{E}\left[X\right]\right)^2\\
\end{aligned}
\]
Definition C.13 (Precision) The precision of a random variable \(X\), often denoted \(\tau(X)\), \(\tau_X\), or shorthanded as \(\tau\), is the inverse of that random variable’s variance; that is:
\[\tau(X) \stackrel{\text{def}}{=}\left(\text{Var}\left(X\right)\right)^{-1}\]
Definition C.14 (Standard deviation) The standard deviation of a random variable \(X\) is the square-root of the variance of \(X\):
\[\text{SD}\left(X\right) \stackrel{\text{def}}{=}\sqrt{\text{Var}\left(X\right)}\]
Definition C.15 (Covariance) For any two one-dimensional random variables, \(X,Y\):
\[\text{Cov}\left(X,Y\right) \stackrel{\text{def}}{=}\mathbb{E}\left[(X - \mathbb{E}\left[X\right])(Y - \mathbb{E}\left[Y\right])\right]\]
Theorem C.12 \[\text{Cov}\left(X,Y\right)= \mathbb{E}\left[XY\right] - \mathbb{E}\left[X\right] \mathbb{E}\left[Y\right]\]
Proof. Left to the reader.
Lemma C.1 (The covariance of a variable with itself is its variance) For any random variable \(X\):
\[\text{Cov}\left(X,X\right) = \text{Var}\left(X\right)\]
Proof. \[
\begin{aligned}
\text{Cov}\left(X,X\right) &= E[XX] - E[X]E[X]
\\ &= E[X^2]-(E[X])^2
\\ &= \text{Var}\left(X\right)
\end{aligned}
\]
Definition C.16 (Variance/covariance of a \(p \times 1\) random vector) For a \(p \times 1\) dimensional random vector \(X\),
\[
\begin{aligned}
\text{Var}(X)
&\stackrel{\text{def}}{=}\text{Cov}(X)\\
&\stackrel{\text{def}}{=}E[ \left( X - E\lbrack X\rbrack \right)^{\top}\left( X - E\lbrack X\rbrack \right) ]\\
\end{aligned}
\]
Theorem C.13 (Alternate expression for variance of a random vector) \[
\begin{aligned}
\text{Var}\left(X\right)
&= E[ X^{\top}X ] - {E\lbrack X\rbrack}^{\top}E\lbrack X\rbrack
\end{aligned}
\]
Proof. \[
\begin{aligned}
\text{Var}\left(X\right)
&= E[ \left( X^{\top} - E\lbrack X\rbrack^{\top} \right)\left( X - E\lbrack X\rbrack \right) ]\\
&= E[ X^{\top}X - E\lbrack X\rbrack^{\top}X - X^{\top}E\lbrack X\rbrack + E\lbrack X\rbrack^{\top}E\lbrack X\rbrack ]\\
&= E[ X^{\top}X ] - E\lbrack X\rbrack^{\top}E\lbrack X\rbrack - {E\lbrack X\rbrack}^{\top}E\lbrack X\rbrack + E\lbrack X\rbrack^{\top}E\lbrack X\rbrack\\
&= E[ X^{\top}X ] - 2{E\lbrack X\rbrack}^{\top}E\lbrack X\rbrack + E\lbrack X\rbrack^{\top}E\lbrack X\rbrack\\
&= E[ X^{\top}X ] - {E\lbrack X\rbrack}^{\top}E\lbrack X\rbrack
\end{aligned}
\]
Theorem C.14 (Variance of a linear combination) For any set of random variables \(X_1, \ldots, X_n\) and corresponding constants \(a_1, ... ,a_n\):
\[\text{Var}\left(\sum_{i=1}^na_i X_i\right) = \sum_{i=1}^n\sum_{j=1}^n a_i a_j \text{Cov}\left(X_i,X_j\right)\]
Proof. Left to the reader…
Lemma C.2 For any two random variables \(X\) and \(Y\) and scalars \(a\) and \(b\):
\[\text{Var}\left(aX + bY\right) = a^2 \text{Var}\left(X\right) + b^2 \text{Var}\left(Y\right) + 2(a \cdot b) \text{Cov}\left(X,Y\right)\]
Definition C.17 (homoskedastic, heteroskedastic) A random variable \(Y\) is homoskedastic (with respect to covariates \(X\)) if the variance of \(Y\) does not vary with \(X\):
\[\text{Var}(Y|X=x) = \sigma^2, \forall x\]
Otherwise it is heteroskedastic.
Definition C.18 (Statistical independence) A set of random variables \(X_1, \ldots, X_n\) are statistically independent if their joint probability is equal to the product of their marginal probabilities:
\[\Pr(X_1=x_1, \ldots, X_n = x_n) = \prod_{i=1}^n{\Pr(X_i=x_i)}\]
The symbol for independence, \(⫫\), is essentially just \(\prod\) upside-down. So the symbol can remind you of its definition (Definition C.18).
Definition C.19 (Conditional independence) A set of random variables \(Y_1, \ldots, Y_n\) are conditionally statistically independent given a set of covariates \(X_1, \ldots, X_n\) if the joint probability of the \(Y_i\)s given the \(X_i\)s is equal to the product of their marginal probabilities:
\[\Pr(Y_1=y_1, \ldots, Y_n = y_n|X_1=x_1, \ldots, X_n = x_n) = \prod_{i=1}^n{\Pr(Y_i=y_i|X_i=x_i)}\]
Definition C.20 (Identically distributed) A set of random variables \(X_1, \ldots, X_n\) are identically distributed if they have the same range \(\mathcal{R}(X)\) and if their marginal distributions \(\text{P}(X_1=x_1), ..., \text{P}(X_n=x_n)\) are all equal to some shared distribution \(\text{P}(X=x)\):
\[
\forall i\in \left\{1:n\right\}, \forall x \in \mathcal{R}(X): \text{P}(X_i=x) = \text{P}(X=x)
\]
Definition C.21 (Conditionally identically distributed) A set of random variables \(Y_1, \ldots, Y_n\) are conditionally identically distributed given a set of covariates \(X_1, \ldots, X_n\) if \(Y_1, \ldots, Y_n\) have the same range \(\mathcal{R}(X)\) and if the distributions \(\text{P}(Y_i=y_i|X_i =x_i)\) are all equal to the same distribution \(\text{P}(Y=y|X=x)\):
\[
\text{P}(Y_i=y|X_i=x) = \text{P}(Y=y|X=x)
\]
Definition C.22 (Independent and identically distributed) A set of random variables \(X_1, \ldots, X_n\) are independent and identically distributed (shorthand: “\(X_i\ \text{iid}\)”) if they are statistically independent and identically distributed.
Definition C.23 (Conditionally independent and identically distributed) A set of random variables \(Y_1, \ldots, Y_n\) are conditionally independent and identically distributed (shorthand: “\(Y_i | X_i\ \text{ciid}\)” or just “\(Y_i |X_i\ \text{iid}\)”) given a set of covariates \(X_1, \ldots, X_n\) if \(Y_1, \ldots, Y_n\) are conditionally independent given \(X_1, \ldots, X_n\) and \(Y_1, \ldots, Y_n\) are identically distributed given \(X_1, \ldots, X_n\).