Appendix I — Notation

Published

Last modified: 2025-07-17: 20:17:05 (PM)

Table I.1: Notation used in this book

symbol	meaning	LaTeX
$\neg$	not	`\neg`
$\forall$	all	`\forall`
$\exists$	some	`\exists`
$\cup$	union, “or”	`\cup`
$\cap$	intersection, “and”	`\cap`
$\mid$	given, conditional on	`\mid`, `\|`
$\sum$	sum	`\sum`
$\prod$	product	`\prod`
$\mu$	mean	`\mu`
$\text{E}$	expectation	`\mathbb{E}`
$x^{\top}$	transpose of $x$	`x^{\top}`
$'$	transpose or derivative¹	`'`
$\perp\!\!\!\perp$	independent	`⫫`
$\therefore$	therefore, thus	`\therefore`
$\eta$	linear component of a GLM	`\eta`
$\left \lfloor{x}\right \rfloor$	floor of $x$: largest integer smaller than $x$	`\lfloor x \rfloor`
$\left \lceil{x}\right \rceil$	ceiling of $x$: smallest integer larger than $x$	`\lceil x \rceil`

I.1 Information matrices

There is no consistency in the notation for observed and expected information matrices (see Table I.2).

Table I.2: notation for information matrices

book	observed information	expected information
Dobson and Barnett (2018)	$U'$	$\mathfrak{I}$
Dunn and Smyth (2018)	$\mathfrak{I}$	$\mathcal{I}$
McLachlan and Krishnan (2007)	$I$	$\mathcal{I}$
Wood (2017)	$\hat{I}$	$\mathcal{I}$

These notes currently have a mixture of notations, depending on my whims and what reference I had last looked at. Eventually, I will try to standardize my notation to $I$ for observed information and $\mathcal{I}$ for expected information.

I.2 Percent sign (“%”)

The percent sign “%” is just a shorthand for “$\times \frac{1}{100}$”. The word “percent” comes from the Latin “per centum”; “centum” means 100 in Latin, so “percent” means “per hundred” (c.f., https://en.wikipedia.org/wiki/Percentage)

So, contrary to what you may have learned previously, $10\% = 0.1$ is a true and correct equality, just as $10 \text{kg} = 10,000 \text{g}$ is true and correct.

Proof. \[ \begin{aligned} 10\% &= 10 \times \frac{1}{100} \\ &= \frac{10}{100} \\ &= 0.1 \end{aligned} \]

You are welcome to switch between decimal and percent notation freely; just make sure you execute it correctly.

I.3 Proofs

We can use any of:

$\therefore$ (\therefore in LaTeX),
$\Rightarrow$ (\Rightarrow),
$\models$ (\models)

to denote logical entailments (deductive consequences).

Let’s save $\rightarrow$ (\rightarrow) for convergence results.

I.4 Why is notation in probability and statistics so inconsistent and disorganized?

In grad school, we are asked to learn from increasingly disorganized materials and lectures. Not coincidentally, as the amount of organization decreases, the amount of complexity increases, the amount of difficulty increases, the number of reliable references decreases, and the amount of inconsistency in notation and content increases (both between multiple references and within single references!). In other words, as you approach the cutting-edge of most fields, you start to encounter into content that hasn’t been fully thought through or standardized. This lack of clarity is unfortunate and undesirable, but it is understandable and inevitable.

It’s worth noting that calculus was formalized in the 1600s, elementary algebra was formalized around 820, and arithmetic even earlier. And calculus still has several competing notation systems. In contrast, the field of statistics only emerged in the late 1800s and early 1900s, so it’s not surprising that the notation and terminology is still developing. Generalized linear models were only formalized in 1972 (Nelder and Wedderburn (1972)), which is very recent in terms of the pace of scientific development.

depending on whether it is applied to a matrix or a function↩︎

# Notation {#sec-notation} {{< include macros.qmd >}} | symbol | meaning | LaTeX |-------------|--------------------------------------------------|----------- | $\neg$ | not | `\neg` | $\forall$ | all | `\forall` | $\exists$ | some | `\exists` | $\cup$ | union, "or" | `\cup` | $\cap$ | intersection, "and" | `\cap` | $\mid$ | given, conditional on | `\mid`, `|` | $\sum$ | sum | `\sum` | $\prod$ | product | `\prod` | $\mu$ | mean | `\mu` | $\Expp$ | [expectation](probability.qmd#def-expectation) | `\mathbb{E}` | $x^{\top}$ | transpose of $x$ | `x^{\top}` | $'$ | transpose or derivative[^3] | `'` | $\ind$ | [independent](probability.qmd#def-indpt) | `⫫` | $\tf$ | therefore, thus | `\therefore` | $\eta$ | [linear component of a GLM][eta] | `\eta` | $\floor{x}$ | floor of $x$: largest integer smaller than $x$ | `\lfloor x \rfloor` | $\ceil{x}$ | ceiling of $x$: smallest integer larger than $x$ | `\lceil x \rceil` : Notation used in this book {#tbl-notation-collected} [eta]: https://en.wikipedia.org/wiki/Generalized_linear_model#:~:text=The%20linear%20predictor%20is%20the,data%20through%20the%20link%20function "linear predictor notation" [^3]: depending on whether it is applied to a matrix or a function ## Information matrices There is no consistency in the notation for observed and expected information matrices (see @tbl-info-mat-symbols). | book | observed information | expected information | |-----------------------| -------------------- | -------------------- | |@dobson4e | $U'$ | $\mathfrak{I}$ | |@dunn2018generalized | $\mathfrak{I}$ | $\mathcal{I}$ | |@mclachlan2007em | $I$ | $\mathcal{I}$ | |@wood2017generalized | $\hat{I}$ | $\mathcal{I}$ | : notation for information matrices {#tbl-info-mat-symbols} These notes currently have a mixture of notations, depending on my whims and what reference I had last looked at. Eventually, I will try to standardize my notation to $\oinf$ for observed information and $\einf$ for expected information. ## Percent sign ("%") The percent sign "%" is just a shorthand for "$\times \frac{1}{100}$". The word "percent" comes from the Latin "per centum"; "centum" means 100 in Latin, so "percent" means "per hundred" (c.f., <https://en.wikipedia.org/wiki/Percentage>) So, contrary to what you may have learned previously, $10\% = 0.1$ is a true and correct equality, just as $10 \text{kg} = 10,000 \text{g}$ is true and correct. :::{.proof} $$ \ba 10\% &= 10 \times \frac{1}{100} \\ &= \frac{10}{100} \\ &= 0.1 \ea $$ ::: You are welcome to switch between decimal and percent notation freely; just make sure you execute it correctly. ## Proofs We can use any of: - $\therefore$ (`\therefore` in LaTeX), - $\Rightarrow$ (`\Rightarrow`), - $\models$ (`\models`) to denote logical entailments (deductive consequences). Let's save $\rightarrow$ (`\rightarrow`) for convergence results. ## Why is notation in probability and statistics so inconsistent and disorganized? In grad school, we are asked to learn from increasingly disorganized materials and lectures. Not coincidentally, as the amount of organization decreases, the amount of complexity increases, the amount of difficulty increases, the number of reliable references decreases, and the amount of inconsistency in notation and content increases (both between multiple references and within single references!). In other words, as you approach the cutting-edge of most fields, you start to encounter into content that hasn’t been fully thought through or standardized. This lack of clarity is unfortunate and undesirable, but it is understandable and inevitable. It's worth noting that calculus was formalized in the [1600s](https://en.wikipedia.org/wiki/Leibniz%27s_notation), elementary algebra was formalized around [820](https://en.wikipedia.org/wiki/Al-Jabr), and arithmetic [even earlier](https://en.wikipedia.org/wiki/Arithmetic#History). And calculus still has [several competing notation systems](https://en.wikipedia.org/wiki/Notation_for_differentiation). In contrast, the field of statistics only emerged in the [late 1800s and early 1900s](https://en.wikipedia.org/wiki/History_of_statistics#Development_of_modern_statistics), so it’s not surprising that the notation and terminology is still developing. Generalized linear models were only formalized in 1972 (@nelder1972generalized), which is very recent in terms of the [pace of scientific development](https://en.wikipedia.org/wiki/The_Structure_of_Scientific_Revolutions).

book	observed information	expected information
Dobson and Barnett (2018)	\(U'\)	\(\mathfrak{I}\)
Dunn and Smyth (2018)	\(\mathfrak{I}\)	\(\mathcal{I}\)
McLachlan and Krishnan (2007)	\(I\)	\(\mathcal{I}\)
Wood (2017)	\(\hat{I}\)	\(\mathcal{I}\)