Definition B.9 (Hazard function) The hazard function for a random variable \(T\) at value \(t\) is the conditional density of \(T\) at \(t\), given \(T\ge t\); that is:
\[h(t) \stackrel{\text{def}}{=}p(T=t|T\ge t)\]
If \(T\) represents the time at which an event occurs, then \(h(t)\) is the probability that the event occurs at time \(t\), given that it has not occurred prior to time \(t\).
Definition B.10 (Expectation, expected value, population mean ) The expectation, expected value, or population mean of a continuous random variable \(X\), denoted \(\mathbb{E}\left[X\right]\), \(\mu(X)\), or \(\mu_X\), is the weighted mean of \(X\)’s possible values, weighted by the probability density function of those values:
\[\mathbb{E}\left[X\right] = \int_{x\in \mathcal{R}(X)} x \cdot \text{p}(X=x)dx\]
The expectation, expected value, or population mean of a discrete random variable \(X\), denoted \(\mathbb{E}\left[X\right]\), \(\mu(X)\), or \(\mu_X\), is the mean of \(X\)’s possible values, weighted by the probability mass function of those values:
\[\mathbb{E}\left[X\right] = \sum_{x \in \mathcal{R}(X)} x \cdot \text{P}(X=x)\]
(c.f. https://en.wikipedia.org/wiki/Expected_value)
Variance and related characteristics
Definition B.11 (Variance) The variance of a random variable \(X\) is the expectation of the squared difference between \(X\) and \(\mathbb{E}\left[X\right]\); that is:
\[
\text{Var}\left(X\right) \stackrel{\text{def}}{=}\mathbb{E}\left[(X-\mathbb{E}\left[X\right])^2\right]
\]
Theorem B.5 (Simplified expression for variance) \[\text{Var}\left(X\right)=\mathbb{E}\left[X^2\right] - \left(\mathbb{E}\left[X\right]\right)^2\]
Proof. By linearity of expectation, we have:
\[
\begin{aligned}
\text{Var}\left(X\right)
&\stackrel{\text{def}}{=}\mathbb{E}\left[(X-\mathbb{E}\left[X\right])^2\right]\\
&=\mathbb{E}\left[X^2 - 2X\mathbb{E}\left[X\right] + \left(\mathbb{E}\left[X\right]\right)^2\right]\\
&=\mathbb{E}\left[X^2\right] - \mathbb{E}\left[2X\mathbb{E}\left[X\right]\right] + \mathbb{E}\left[\left(\mathbb{E}\left[X\right]\right)^2\right]\\
&=\mathbb{E}\left[X^2\right] - 2\mathbb{E}\left[X\right]\mathbb{E}\left[X\right] + \left(\mathbb{E}\left[X\right]\right)^2\\
&=\mathbb{E}\left[X^2\right] - \left(\mathbb{E}\left[X\right]\right)^2\\
\end{aligned}
\]
Definition B.12 (Precision) The precision of a random variable \(X\), often denoted \(\tau(X)\), \(\tau_X\), or shorthanded as \(\tau\), is the inverse of that random variable’s variance; that is:
\[\tau(X) \stackrel{\text{def}}{=}\left(\text{Var}\left(X\right)\right)^{-1}\]
Definition B.13 (Standard deviation) The standard deviation of a random variable \(X\) is the square-root of the variance of \(X\):
\[\text{SD}\left(X\right) \stackrel{\text{def}}{=}\sqrt{\text{Var}\left(X\right)}\]
Definition B.14 (Covariance) For any two one-dimensional random variables, \(X,Y\):
\[\text{Cov}\left(X,Y\right) \stackrel{\text{def}}{=}\mathbb{E}\left[(X - \mathbb{E}\left[X\right])(Y - \mathbb{E}\left[Y\right])\right]\]
Theorem B.6 \[\text{Cov}\left(X,Y\right)= \mathbb{E}\left[XY\right] - \mathbb{E}\left[X\right] \mathbb{E}\left[Y\right]\]
Proof. Left to the reader.
Lemma B.1 (The covariance of a variable with itself is its variance) For any random variable \(X\):
\[\text{Cov}\left(X,X\right) = \text{Var}\left(X\right)\]
Proof. \[
\begin{aligned}
\text{Cov}\left(X,X\right) &= E[XX] - E[X]E[X]
\\ &= E[X^2]-(E[X])^2
\\ &= \text{Var}\left(X\right)
\end{aligned}
\]
Definition B.15 (Variance/covariance of a \(p \times 1\) random vector) For a \(p \times 1\) dimensional random vector \(X\),
\[
\begin{aligned}
\text{Var}(X)
&\stackrel{\text{def}}{=}\text{Cov}(X)\\
&\stackrel{\text{def}}{=}E[ \left( X - E\lbrack X\rbrack \right)^{\top}\left( X - E\lbrack X\rbrack \right) ]\\
\end{aligned}
\]
Theorem B.7 (Alternate expression for variance of a random vector) \[
\begin{aligned}
\text{Var}\left(X\right)
&= E[ X^{\top}X ] - {E\lbrack X\rbrack}^{\top}E\lbrack X\rbrack
\end{aligned}
\]
Proof. \[
\begin{aligned}
\text{Var}\left(X\right)
&= E[ \left( X^{\top} - E\lbrack X\rbrack^{\top} \right)\left( X - E\lbrack X\rbrack \right) ]\\
&= E[ X^{\top}X - E\lbrack X\rbrack^{\top}X - X^{\top}E\lbrack X\rbrack + E\lbrack X\rbrack^{\top}E\lbrack X\rbrack ]\\
&= E[ X^{\top}X ] - E\lbrack X\rbrack^{\top}E\lbrack X\rbrack - {E\lbrack X\rbrack}^{\top}E\lbrack X\rbrack + E\lbrack X\rbrack^{\top}E\lbrack X\rbrack\\
&= E[ X^{\top}X ] - 2{E\lbrack X\rbrack}^{\top}E\lbrack X\rbrack + E\lbrack X\rbrack^{\top}E\lbrack X\rbrack\\
&= E[ X^{\top}X ] - {E\lbrack X\rbrack}^{\top}E\lbrack X\rbrack
\end{aligned}
\]
Theorem B.8 (Variance of a linear combination) For any set of random variables \(X_1, \ldots, X_n\) and corresponding constants \(a_1, ... ,a_n\):
\[\text{Var}\left(\sum_{i=1}^na_i X_i\right) = \sum_{i=1}^n\sum_{j=1}^n a_i a_j \text{Cov}\left(X_i,X_j\right)\]
Proof. Left to the reader…
Lemma B.2 For any two random variables \(X\) and \(Y\) and scalars \(a\) and \(b\):
\[\text{Var}\left(aX + bY\right) = a^2 \text{Var}\left(X\right) + b^2 \text{Var}\left(Y\right) + 2(a \cdot b) \text{Cov}\left(X,Y\right)\]
Definition B.16 (homoskedastic, heteroskedastic) A random variable \(Y\) is homoskedastic (with respect to covariates \(X\)) if the variance of \(Y\) does not vary with \(X\):
\[\text{Var}(Y|X=x) = \sigma^2, \forall x\]
Otherwise it is heteroskedastic.
Definition B.17 (Statistical independence) A set of random variables \(X_1, \ldots, X_n\) are statistically independent if their joint probability is equal to the product of their marginal probabilities:
\[\Pr(X_1=x_1, \ldots, X_n = x_n) = \prod_{i=1}^n{\Pr(X_i=x_i)}\]
The symbol for independence, \(⫫\), is essentially just \(\prod\) upside-down. So the symbol can remind you of its definition (Definition B.17).
Definition B.18 (Conditional independence) A set of random variables \(Y_1, \ldots, Y_n\) are conditionally statistically independent given a set of covariates \(X_1, \ldots, X_n\) if the joint probability of the \(Y_i\)s given the \(X_i\)s is equal to the product of their marginal probabilities:
\[\Pr(Y_1=y_1, \ldots, Y_n = y_n|X_1=x_1, \ldots, X_n = x_n) = \prod_{i=1}^n{\Pr(Y_i=y_i|X_i=x_i)}\]
Definition B.19 (Identically distributed) A set of random variables \(X_1, \ldots, X_n\) are identically distributed if they have the same range \(\mathcal{R}(X)\) and if their marginal distributions \(\text{P}(X_1=x_1), ..., \text{P}(X_n=x_n)\) are all equal to some shared distribution \(\text{P}(X=x)\):
\[
\forall i\in \left\{1:n\right\}, \forall x \in \mathcal{R}(X): \text{P}(X_i=x) = \text{P}(X=x)
\]
Definition B.20 (Conditionally identically distributed) A set of random variables \(Y_1, \ldots, Y_n\) are conditionally identically distributed given a set of covariates \(X_1, \ldots, X_n\) if \(Y_1, \ldots, Y_n\) have the same range \(\mathcal{R}(X)\) and if the distributions \(\text{P}(Y_i=y_i|X_i =x_i)\) are all equal to the same distribution \(\text{P}(Y=y|X=x)\):
\[
\text{P}(Y_i=y|X_i=x) = \text{P}(Y=y|X=x)
\]
Definition B.21 (Independent and identically distributed) A set of random variables \(X_1, \ldots, X_n\) are independent and identically distributed (shorthand: “\(X_i\ \text{iid}\)”) if they are statistically independent and identically distributed.
Definition B.22 (Conditionally independent and identically distributed) A set of random variables \(Y_1, \ldots, Y_n\) are conditionally independent and identically distributed (shorthand: “\(Y_i | X_i\ \text{ciid}\)” or just “\(Y_i |X_i\ \text{iid}\)”) given a set of covariates \(X_1, \ldots, X_n\) if \(Y_1, \ldots, Y_n\) are conditionally independent given \(X_1, \ldots, X_n\) and \(Y_1, \ldots, Y_n\) are identically distributed given \(X_1, \ldots, X_n\).