Notation
| symbol | meaning | LaTeX |
|---|---|---|
| \(\neg\) | not | \neg |
| \(\forall\) | all | \forall |
| \(\exists\) | some | \exists |
| \(\cup\) | union, “or” | \cup |
| \(\cap\) | intersection, “and” | \cap |
| \(\mid\) | given, conditional on | \mid, | |
| \(\sum\) | sum | \sum |
| \(\prod\) | product | \prod |
| \(\mu\) | mean | \mu |
| \(\text{E}\) | expectation | \mathbb{E} |
| \(x^{\top}\) | transpose of \(x\) | x^{\top} |
| \('\) | transpose or derivative1 | ' |
| \(\perp\!\!\!\perp\) | independent | ⫫ |
| \(\therefore\) | therefore, thus | \therefore |
| \(\eta\) | linear component of a GLM | \eta |
| \(\left \lfloor{x}\right \rfloor\) | floor of \(x\): largest integer smaller than \(x\) | \lfloor x \rfloor |
| \(\left \lceil{x}\right \rceil\) | ceiling of \(x\): smallest integer larger than \(x\) | \lceil x \rceil |
1 Information matrices
There is no consistency in the notation for observed and expected information matrices (see Table 2).
These notes currently have a mixture of notations, depending on my whims and what reference I had last looked at. Eventually, I will try to standardize my notation to \(I\) for observed information and \(\mathcal{I}\) for expected information.
2 Percent sign (“%”)
The percent sign “%” is just a shorthand for “\(/100\)”. The word “percent” comes from the Latin “per centum”; “centum” is Latin for 100, so “percent” means “per hundred” (c.f., https://en.wikipedia.org/wiki/Percentage)
So, contrary to what you may have learned previously, \(10\% = 0.1\) is a true and correct equality, just as \(10 \text{kg} = 10,000 \text{g}\) is true and correct.
Proof. \[ \begin{aligned} 10\% &= 10 / 100 \\ &= \frac{10}{100} \\ &= 0.1 \end{aligned} \]
You are welcome to switch between decimal and percent notation freely; just make sure you execute it correctly.
3 Proofs
We can use any of:
- \(\therefore\) (
\thereforein LaTeX), - \(\Rightarrow\) (
\Rightarrow), - \(\models\) (
\models)
to denote logical entailments (deductive consequences).
Let’s save \(\rightarrow\) (\rightarrow) for convergence results.
4 Stochastic vs. probabilistic vs. random
The terms “stochastic”, “probabilistic”, and “random” are frequently used in statistics and probability theory, often interchangeably in everyday conversation, but they carry nuanced technical distinctions.
4.1 Key distinction: modeling approach vs. phenomena
As noted in Wikipedia:
Stochasticity and randomness are technically distinct concepts: the former refers to a modeling approach, while the latter describes phenomena; in everyday conversation these terms are often used interchangeably.
4.2 Definitions
Random describes something that occurs by chance, without a deterministic pattern. It is the most general term, used to describe variables or occurrences whose outcome cannot be predicted precisely, only probabilistically. For example, we speak of “random variables” and “random events”.
The term “random” is sometimes used as shorthand for a uniform distribution (especially the discrete uniform distribution), but it can refer to any probability distribution.
Stochastic comes from the Greek “στόχος” (stókhos), meaning “aim” or “guess” (see etymology). In mathematics, a stochastic process is formally defined as a collection of random variables indexed by time or space. The term is almost always used in the context of processes or systems evolving in time or space under uncertain rules. Note that in probability theory, “stochastic process” and “random process” are synonyms (Adler and Taylor 2009; Stirzaker 2005; Kallenberg 2002).
Probabilistic refers to any model, reasoning, or method that explicitly involves probability theory. Probabilistic models assign probabilities to events or outcomes; they focus on quantifying and reasoning about uncertainty based on known or estimated distributions. While all stochastic models are probabilistic (since they use probabilities), not all probabilistic models need to describe processes evolving in time.
4.3 Summary of usage
| Term | What it describes | Typical use | Example |
|---|---|---|---|
| Random | Single variable or event | Random variable, random outcome | Coin toss, die roll |
| Stochastic | System or process in time/space | Stochastic process | Stock price evolution, Markov chain |
| Probabilistic | Approach/model using probability | Probabilistic model/reasoning | Bayesian inference, regression |
While some sources treat “stochastic” and “random” as practically synonymous, the academic preference is to use “random” for variables and events, and “stochastic” for processes, especially to highlight temporal or spatial structure in the modeling.
4.4 Additional resources
5 Why is notation in probability and statistics so inconsistent and disorganized?
In grad school, we are asked to learn from increasingly disorganized materials and lectures. Not coincidentally, as the amount of organization decreases, the amount of complexity increases, the amount of difficulty increases, the number of reliable references decreases, and the amount of inconsistency in notation and content increases (both between multiple references and within single references!). In other words, as you approach the cutting-edge of most fields, you start to encounter into content that hasn’t been fully thought through or standardized. This lack of clarity is unfortunate and undesirable, but it is understandable and inevitable.
It’s worth noting that calculus was formalized in the 1600s, elementary algebra was formalized around 820, and arithmetic even earlier. And calculus still has several competing notation systems. In contrast, the field of statistics only emerged in the late 1800s and early 1900s, so it’s not surprising that the notation and terminology is still developing. Generalized linear models were only formalized in 1972 (Nelder and Wedderburn (1972)), which is very recent in terms of the pace of scientific development.
References
Footnotes
depending on whether it is applied to a matrix or a function↩︎