Probability density function
Definition C.10 (probability density) If \(X\) is a continuous random variable, then the probability density of \(X\) at value \(x\), denoted \(f(x)\), \(f_X(x)\), \(\text{p}(x)\), \(\text{p}_X(x)\), or \(\text{p}(X=x)\), is defined as the limit of the probability (mass) that \(X\) is in an interval around \(x\), divided by the width of that interval, as that width reduces to 0.
\[
\begin{aligned}
f(x) &\stackrel{\text{def}}{=}\lim_{\Delta \rightarrow 0}
\frac{\text{P}(X \in [x, x + \Delta])}{\Delta}
\end{aligned}
\]
Theorem C.9 (Density function is derivative of CDF) The density function \(f(t)\) or \(\text{p}(T=t)\) for a random variable \(T\) at value \(t\) is equal to the derivative of the cumulative probability function \(F(t) \stackrel{\text{def}}{=}P(T\le t)\); that is:
\[f(t) \stackrel{\text{def}}{=}\frac{\partial}{\partial t} F(t)\]
Theorem C.10 (Density functions integrate to 1) For any density function \(f(x)\),
\[\int_{x \in \mathcal{R}(X)} f(x) dx = 1\]
Expectation
Definition C.12 (Expectation, expected value, population mean ) The expectation, expected value, or population mean of a continuous random variable \(X\), denoted \(\text{E}{\left[X\right]}\), \(\mu(X)\), or \(\mu_X\), is the weighted mean of \(X\)’s possible values, weighted by the probability density function of those values:
\[\text{E}{\left[X\right]} = \int_{x\in \mathcal{R}(X)} x \cdot \text{p}(X=x)dx\]
The expectation, expected value, or population mean of a discrete random variable \(X\), denoted \(\text{E}{\left[X\right]}\), \(\mu(X)\), or \(\mu_X\), is the mean of \(X\)’s possible values, weighted by the probability mass function of those values:
\[\text{E}{\left[X\right]} = \sum_{x \in \mathcal{R}(X)} x \cdot \text{P}(X=x)\]
(c.f. https://en.wikipedia.org/wiki/Expected_value)
Theorem C.11 (Expectation of the Bernoulli distribution) The expectation of a Bernoulli random variable with parameter \(\pi\) is:
\[\text{E}{\left[X\right]} = \pi\]
Proof. \[
\begin{aligned}
\text{E}{\left[X\right]}
&= \sum_{x\in \mathcal{R}(X)} x \cdot\text{P}(X=x)
\\&= \sum_{x\in {\left\{0,1\right\}}} x \cdot\text{P}(X=x)
\\&= {\left(0 \cdot\text{P}(X=0)\right)} + {\left(1 \cdot\text{P}(X=1)\right)}
\\&= {\left(0 \cdot(1-\pi)\right)} + {\left(1 \cdot\pi\right)}
\\&= 0 + \pi
\\&= \pi
\end{aligned}
\]
Theorem C.12 (Expectation of time-to-event variables) If \(T\) is a non-negative random variable, then:
\[\mu(T|\tilde{X}= \tilde{x}) = \int_{t=0}^{\infty}\text{S}(t)dt\]
Variance and related characteristics
Definition C.13 (Variance) The variance of a random variable \(X\) is the expectation of the squared difference between \(X\) and \(\text{E}{\left[X\right]}\); that is:
\[
\text{Var}{\left(X\right)} \stackrel{\text{def}}{=}\text{E}{\left[(X-\text{E}{\left[X\right]})^2\right]}
\]
Theorem C.13 (Simplified expression for variance) \[\text{Var}{\left(X\right)}=\text{E}{\left[X^2\right]} - {\left(\text{E}{\left[X\right]}\right)}^2\]
Proof. By linearity of expectation, we have:
\[
\begin{aligned}
\text{Var}{\left(X\right)}
&\stackrel{\text{def}}{=}\text{E}{\left[(X-\text{E}{\left[X\right]})^2\right]}\\
&=\text{E}{\left[X^2 - 2X\text{E}{\left[X\right]} + {\left(\text{E}{\left[X\right]}\right)}^2\right]}\\
&=\text{E}{\left[X^2\right]} - \text{E}{\left[2X\text{E}{\left[X\right]}\right]} + \text{E}{\left[{\left(\text{E}{\left[X\right]}\right)}^2\right]}\\
&=\text{E}{\left[X^2\right]} - 2\text{E}{\left[X\right]}\text{E}{\left[X\right]} + {\left(\text{E}{\left[X\right]}\right)}^2\\
&=\text{E}{\left[X^2\right]} - {\left(\text{E}{\left[X\right]}\right)}^2\\
\end{aligned}
\]
Definition C.14 (Precision) The precision of a random variable \(X\), often denoted \(\tau(X)\), \(\tau_X\), or shorthanded as \(\tau\), is the inverse of that random variable’s variance; that is:
\[\tau(X) \stackrel{\text{def}}{=}{\left(\text{Var}{\left(X\right)}\right)}^{-1}\]
Definition C.15 (Standard deviation) The standard deviation of a random variable \(X\) is the square-root of the variance of \(X\):
\[\text{SD}{\left(X\right)} \stackrel{\text{def}}{=}\sqrt{\text{Var}{\left(X\right)}}\]
Definition C.16 (Covariance) For any two one-dimensional random variables, \(X,Y\):
\[\text{Cov}{\left(X,Y\right)} \stackrel{\text{def}}{=}\text{E}{\left[(X - \text{E}{\left[X\right]})(Y - \text{E}{\left[Y\right]})\right]}\]
Theorem C.14 \[\text{Cov}{\left(X,Y\right)}= \text{E}{\left[XY\right]} - \text{E}{\left[X\right]} \text{E}{\left[Y\right]}\]
Proof. Left to the reader.
Lemma C.1 (The covariance of a variable with itself is its variance) For any random variable \(X\):
\[\text{Cov}{\left(X,X\right)} = \text{Var}{\left(X\right)}\]
Proof. \[
\begin{aligned}
\text{Cov}{\left(X,X\right)} &= E[XX] - E[X]E[X]
\\ &= E[X^2]-(E[X])^2
\\ &= \text{Var}{\left(X\right)}
\end{aligned}
\]
Definition C.17 (Variance/covariance of a \(p \times 1\) random vector) For a \(p \times 1\) dimensional random vector \(\tilde{X}\),
\[
\begin{aligned}
\text{Var}{\left(\tilde{X}\right)}
&\stackrel{\text{def}}{=}\text{Cov}{\left(\tilde{X}\right)}
\\
&\stackrel{\text{def}}{=}\text{E}{\left[{\left(\tilde{X}- \text{E}\tilde{X}\right)}^{\top} {\left(\tilde{X}- \text{E}\tilde{X}\right)}\right]}
\end{aligned}
\]
Theorem C.15 (Alternate expression for variance of a random vector) \[
\begin{aligned}
\text{Var}{\left(X\right)}
&= E[ X^{\top}X ] - {E\lbrack X\rbrack}^{\top}E\lbrack X\rbrack
\end{aligned}
\]
Proof. \[
\begin{aligned}
\text{Var}{\left(X\right)}
&= E[ \left( X^{\top} - E\lbrack X\rbrack^{\top} \right)\left( X - E\lbrack X\rbrack \right) ]\\
&= E[ X^{\top}X - E\lbrack X\rbrack^{\top}X - X^{\top}E\lbrack X\rbrack + E\lbrack X\rbrack^{\top}E\lbrack X\rbrack ]\\
&= E[ X^{\top}X ] - E\lbrack X\rbrack^{\top}E\lbrack X\rbrack - {E\lbrack X\rbrack}^{\top}E\lbrack X\rbrack + E\lbrack X\rbrack^{\top}E\lbrack X\rbrack\\
&= E[ X^{\top}X ] - 2{E\lbrack X\rbrack}^{\top}E\lbrack X\rbrack + E\lbrack X\rbrack^{\top}E\lbrack X\rbrack\\
&= E[ X^{\top}X ] - {E\lbrack X\rbrack}^{\top}E\lbrack X\rbrack
\end{aligned}
\]
Theorem C.16 (Variance of a linear combination) For any vector of random variables \(\tilde{X}= (X_1, \ldots, X_n)\) and corresponding vector of constants \(\tilde{a}= (a_1, ... ,a_n)\), the variance of their linear combination is:
\[
\begin{aligned}
\text{Var}{\left(\tilde{a}\cdot \tilde{X}\right)}
&= \text{Var}{\left(\sum_{i=1}^na_i X_i\right)}
\\ &= \tilde{a}^{\top} \text{Var}{\left(\tilde{X}\right)} \tilde{a}
\\ &= \sum_{i=1}^n\sum_{j=1}^n a_i a_j \text{Cov}{\left(X_i,X_j\right)}
\end{aligned}
\]
Proof. Left to the reader…
Corollary C.3 For any two random variables \(X\) and \(Y\) and scalars \(a\) and \(b\):
\[\text{Var}{\left(aX + bY\right)} = a^2 \text{Var}{\left(X\right)} + b^2 \text{Var}{\left(Y\right)} + 2(a \cdot b) \text{Cov}{\left(X,Y\right)}\]
Definition C.18 (homoskedastic, heteroskedastic) A random variable \(Y\) is homoskedastic (with respect to covariates \(X\)) if the variance of \(Y\) does not vary with \(X\):
\[\text{Var}(Y|X=x) = \sigma^2, \forall x\]
Otherwise it is heteroskedastic.
Definition C.19 (Statistical independence) A set of random variables \(X_1, \ldots, X_n\) are statistically independent if their joint probability is equal to the product of their marginal probabilities:
\[\Pr(X_1=x_1, \ldots, X_n = x_n) = \prod_{i=1}^n{\Pr(X_i=x_i)}\]
The symbol for independence, \(\perp\!\!\!\perp\), is essentially just \(\prod\) upside-down. So the symbol can remind you of its definition (Definition C.19).
Definition C.20 (Conditional independence) A set of random variables \(Y_1, \ldots, Y_n\) are conditionally statistically independent given a set of covariates \(X_1, \ldots, X_n\) if the joint probability of the \(Y_i\)s given the \(X_i\)s is equal to the product of their marginal probabilities:
\[\Pr(Y_1=y_1, \ldots, Y_n = y_n|X_1=x_1, \ldots, X_n = x_n) = \prod_{i=1}^n{\Pr(Y_i=y_i|X_i=x_i)}\]
Definition C.21 (Identically distributed) A set of random variables \(X_1, \ldots, X_n\) are identically distributed if they have the same range \(\mathcal{R}(X)\) and if their marginal distributions \(\text{P}(X_1=x_1), ..., \text{P}(X_n=x_n)\) are all equal to some shared distribution \(\text{P}(X=x)\):
\[
\forall i\in {\left\{1:n\right\}}, \forall x \in \mathcal{R}(X): \text{P}(X_i=x) = \text{P}(X=x)
\]
Definition C.22 (Conditionally identically distributed) A set of random variables \(Y_1, \ldots, Y_n\) are conditionally identically distributed given a set of covariates \(X_1, \ldots, X_n\) if \(Y_1, \ldots, Y_n\) have the same range \(\mathcal{R}(X)\) and if the distributions \(\text{P}(Y_i=y_i|X_i =x_i)\) are all equal to the same distribution \(\text{P}(Y=y|X=x)\):
\[
\text{P}(Y_i=y|X_i=x) = \text{P}(Y=y|X=x)
\]
Definition C.23 (Independent and identically distributed) A set of random variables \(X_1, \ldots, X_n\) are independent and identically distributed (shorthand: “\(X_i\ \text{iid}\)”) if they are statistically independent and identically distributed.
Definition C.24 (Conditionally independent and identically distributed) A set of random variables \(Y_1, \ldots, Y_n\) are conditionally independent and identically distributed (shorthand: “\(Y_i | X_i\ \text{ciid}\)” or just “\(Y_i |X_i\ \text{iid}\)”) given a set of covariates \(X_1, \ldots, X_n\) if \(Y_1, \ldots, Y_n\) are conditionally independent given \(X_1, \ldots, X_n\) and \(Y_1, \ldots, Y_n\) are identically distributed given \(X_1, \ldots, X_n\).