Notation

Published

Last modified: 2026-04-14: 19:59:04 (UTC)

Table 1: Notation used in this book

symbol	meaning	LaTeX
\(\neg\)	not	`\neg`
\(\forall\)	all	`\forall`
\(\exists\)	some	`\exists`
\(\cup\)	union, “or”	`\cup`
\(\cap\)	intersection, “and”	`\cap`
\(\mid\)	given, conditional on	`\mid`, `\|`
\(\sum\)	sum	`\sum`
\(\prod\)	product	`\prod`
\(\mu\)	mean	`\mu`
\(\text{E}\)	expectation	`\mathbb{E}`
\(x^{\top}\)	transpose of \(x\)	`x^{\top}`
\('\)	transpose or derivative¹	`'`
\(\perp\!\!\!\perp\)	independent	`⫫`
\(\therefore\)	therefore, thus	`\therefore`
\(\eta\)	linear component of a GLM	`\eta`
\(\left \lfloor{x}\right \rfloor\)	floor of \(x\): largest integer smaller than \(x\)	`\lfloor x \rfloor`
\(\left \lceil{x}\right \rceil\)	ceiling of \(x\): smallest integer larger than \(x\)	`\lceil x \rceil`

1 Information matrices

There is no consistency in the notation for observed and expected information matrices (see Table 2).

Table 2: notation for information matrices

book	observed information	expected information
Dobson and Barnett (2018)	\(U'\)	\(\mathfrak{I}\)
Dunn and Smyth (2018)	\(\mathfrak{I}\)	\(\mathcal{I}\)
McLachlan and Krishnan (2007)	\(I\)	\(\mathcal{I}\)
Wood (2017)	\(\hat{I}\)	\(\mathcal{I}\)

These notes currently have a mixture of notations, depending on my whims and what reference I had last looked at. Eventually, I will try to standardize my notation to \(I\) for observed information and \(\mathcal{I}\) for expected information.

2 Percent sign (“%”)

The percent sign “%” is just a shorthand for “\(/100\)”. The word “percent” comes from the Latin “per centum”; “centum” is Latin for 100, so “percent” means “per hundred” (c.f., https://en.wikipedia.org/wiki/Percentage)

So, contrary to what you may have learned previously, \(10\% = 0.1\) is a true and correct equality, just as \(10 \text{kg} = 10,000 \text{g}\) is true and correct.

Proof. \[ \begin{aligned} 10\% &= 10 / 100 \\ &= \frac{10}{100} \\ &= 0.1 \end{aligned} \]

You are welcome to switch between decimal and percent notation freely; just make sure you execute it correctly.

3 Proofs

We can use any of:

\(\therefore\) (\therefore in LaTeX),
\(\Rightarrow\) (\Rightarrow),
\(\models\) (\models)

to denote logical entailments (deductive consequences).

Let’s save \(\rightarrow\) (\rightarrow) for convergence results.

4 Stochastic vs. probabilistic vs. random

The terms “stochastic”, “probabilistic”, and “random” are frequently used in statistics and probability theory, often interchangeably in everyday conversation, but they carry nuanced technical distinctions.

4.1 Key distinction: modeling approach vs. phenomena

As noted in Wikipedia:

Stochasticity and randomness are technically distinct concepts: the former refers to a modeling approach, while the latter describes phenomena; in everyday conversation these terms are often used interchangeably.

4.2 Definitions

Random describes something that occurs by chance, without a deterministic pattern. It is the most general term, used to describe variables or occurrences whose outcome cannot be predicted precisely, only probabilistically. For example, we speak of “random variables” and “random events”.

Note

The term “random” is sometimes used as shorthand for a uniform distribution (especially the discrete uniform distribution), but it can refer to any probability distribution.

Stochastic comes from the Greek “στόχος” (stókhos), meaning “aim” or “guess” (see etymology). In mathematics, a stochastic process is formally defined as a collection of random variables indexed by time or space. The term is almost always used in the context of processes or systems evolving in time or space under uncertain rules. Note that in probability theory, “stochastic process” and “random process” are synonyms (Adler and Taylor 2009; Stirzaker 2005; Kallenberg 2002).

Probabilistic refers to any model, reasoning, or method that explicitly involves probability theory. Probabilistic models assign probabilities to events or outcomes; they focus on quantifying and reasoning about uncertainty based on known or estimated distributions. While all stochastic models are probabilistic (since they use probabilities), not all probabilistic models need to describe processes evolving in time.

4.3 Summary of usage

Table 3: Comparison of “random”, “stochastic”, and “probabilistic”

Term	What it describes	Typical use	Example
Random	Single variable or event	Random variable, random outcome	Coin toss, die roll
Stochastic	System or process in time/space	Stochastic process	Stock price evolution, Markov chain
Probabilistic	Approach/model using probability	Probabilistic model/reasoning	Bayesian inference, regression

While some sources treat “stochastic” and “random” as practically synonymous, the academic preference is to use “random” for variables and events, and “stochastic” for processes, especially to highlight temporal or spatial structure in the modeling.

4.4 Additional resources

5 Why is notation in probability and statistics so inconsistent and disorganized?

In grad school, we are asked to learn from increasingly disorganized materials and lectures. Not coincidentally, as the amount of organization decreases, the amount of complexity increases, the amount of difficulty increases, the number of reliable references decreases, and the amount of inconsistency in notation and content increases (both between multiple references and within single references!). In other words, as you approach the cutting-edge of most fields, you start to encounter into content that hasn’t been fully thought through or standardized. This lack of clarity is unfortunate and undesirable, but it is understandable and inevitable.

It’s worth noting that calculus was formalized in the 1600s, elementary algebra was formalized around 820, and arithmetic even earlier. And calculus still has several competing notation systems. In contrast, the field of statistics only emerged in the late 1800s and early 1900s, so it’s not surprising that the notation and terminology is still developing. Generalized linear models were only formalized in 1972 (Nelder and Wedderburn (1972)), which is very recent in terms of the pace of scientific development.

References

Adler, Robert J., and Jonathan E. Taylor. 2009. Random Fields and Geometry. Springer. https://doi.org/10.1007/978-0-387-48116-6.

Dobson, Annette J, and Adrian G Barnett. 2018. An Introduction to Generalized Linear Models. 4th ed. CRC press. https://doi.org/10.1201/9781315182780.

Dunn, Peter K, and Gordon K Smyth. 2018. Generalized Linear Models with Examples in R. Vol. 53. Springer. https://link.springer.com/book/10.1007/978-1-4419-0118-7.

Kallenberg, Olav. 2002. Foundations of Modern Probability. 2nd ed. Springer. https://doi.org/10.1007/978-1-4757-4015-8.

McLachlan, Geoffrey J, and Thriyambakam Krishnan. 2007. The EM Algorithm and Extensions. 2nd ed. John Wiley & Sons. https://doi.org/10.1002/9780470191613.

Nelder, John Ashworth, and Robert WM Wedderburn. 1972. “Generalized Linear Models.” Journal of the Royal Statistical Society Series A: Statistics in Society 135 (3): 370–84.

Stirzaker, David. 2005. Stochastic Processes and Models. Oxford University Press. https://global.oup.com/academic/product/stochastic-processes-and-models-9780198568131.

Wood, Simon N. 2017. Generalized Additive Models: An Introduction with r. Chapman; hall/CRC.

Footnotes

depending on whether it is applied to a matrix or a function↩︎