Regression Models for Epidemiology
Preface
This web-book is derived from my lecture slides for Epidemiology 204: “Quantitative Epidemiology III: Statistical Models”, at UC Davis.
I have drawn these materials from many sources, including but not limited to:
David Rocke’s materials from the 2021 edition of this course
Hua Zhou’s materials from the 2020 edition of Biostat 200C at UCLA
Vittinghoff et al. (2012)
Dobson and Barnett (2018)
Harrell (2015)
Using these lecture notes
These lecture notes are available online at https://d-morrison.github.io/rme/. The online notes are searchable and are currently being iteratively updated1. A pdf version of the notes is also downloadable from https://d-morrison.github.io/rme/Regression-Models-for-Epidemiology.pdf, and the source files are available at https://github.com/d-morrison/rme.
Compiling chapters as lecture slide decks
Each chapter’s source file can also be compiled as a lecture slide deck, using the _quarto-revealjs.yml
Quarto profile included in the git repository on Github.
For example, to compile Chapter 3 Models for Binary Outcomes as a slide deck:
clone the project repository from Github
Install the project dependencies using
devtools
:
library(devtools) # install from CRAN if needed
::install_deps() devtools
- Render the chapter using the
revealjs
profile using the following terminal shell command:
quarto render logistic-regression.qmd --profile=revealjs
You can also render all the chapters listed in the _quarto-revealjs.yml
Quarto profile as slide decks simultaneously:
quarto render --profile=revealjs
Extracting LaTeX commands from the online version of the notes
If you want to extract the LaTeX commands for any math expressions in the online lecture notes, you should be able to right-click and get this pop-up menu:
If you select “TeX commands”, you will get a window with LaTeX code.2
You can also grab the TeX commands from the quarto source files on github, but those files use custom macros (defined in https://github.com/d-morrison/rme/blob/main/macros.qmd), so it’s a little harder to reuse code from the source files.
Dark Mode
The online notes have two color palette themes: light and dark. You can toggle between them using the oval button near the top-left corner:
Other resources
These notes represent my still-developing perspective on regression models in epidemiology. Many other statisticians and epidemiologists have published their own perspectives, and I encourage you to explore your many options and find ones that resonate with you. I have attempted to cite my sources throughout these notes.
Here are some additional resources that I’ve come across; I haven’t had time to read some of them thoroughly yet, but they’re all on my to-do list. I’ll add my thoughts on them over time.
Dobson and Barnett (2018) is a classic textbook on GLMs. It was used in UCLA Biostatistics’s MS-level GLMs course (Biostat 200C) when I took it, and it helped me a lot. It is fairly mathematically rigorous and concise, bordering on terse. It covers GLMs in detail, and survival analysis briefly, and it also has helpful chapters on Bayesian methods. I have adapted examples and explanations from it extensively in these notes.
Hosmer, Lemeshow, and Sturdivant (2013) is a classic text on logistic regression. I haven’t read it yet.
Agresti (2012) is another classic text for GLMs. I haven’t read it yet.
Agresti (2018) appears to be a more applied version of Agresti (2012). I haven’t read it yet. There are extra exercises and other resources available on the Student Companion Site
Agresti (2015) has “More than 400 exercises for readers to practice and extend the theory, methods, and data analysis”; might be more theoretical?
Agresti (2010) is specifically about ordinal data.
Dunn and Smyth (2018) is a recent textbook on GLMs. It doesn’t cover time-to-event models, and it doesn’t use the modern
tidyverse
packages (ggplot2
,dplyr
, etc.), but otherwise it seems great.Moore (2016) is a recent textbook on survival analysis. It also doesn’t use the
tidyverse
, but otherwise seems great.Klein and Moeschberger (2003) is a classic text for survival analysis. I read most of it in grad school, and it was very helpful. Examples and explanations from it are borrowed extensively in the second half of these notes (partially filtered through David Rocke’s course notes.)
Kalbfleisch and Prentice (2011) is another classic survival analysis text; I haven’t read it yet.
Kleinbaum and Klein (2010) is a mostly applied-level “self-learning” text for logistic regression; I read it cover-to-cover before grad school, and found it very helpful.
Kleinbaum and Klein (2012) is the corresponding “self-learning” text for survival analysis; I read it cover-to-cover before grad school, and found it very helpful.
Harrell (2015) is another popular textbook. It uses
ggplot2
but notdplyr
, and covers logistic regression and survival analysis (no Poisson or NB models?). An abbreviated but continuously updated version with audio clips is available at https://hbiostat.org/rmsc/.McCullagh and Nelder (1989) is a classic, theoretical textbook on GLMs 4
Dalgaard (2008) covers GLMs and survival analysis at an applied level, using base R
Vittinghoff et al. (2012) covers GLMs, survival analysis, and causal inference, using Stata. The authors are UCSF professors, and it is used for the core Epi PhD courses there. I read this book nearly cover-to-cover before grad school, and it was hugely helpful for me, both for statistical modeling and for causal inference (I think it provided my first exposure to DAGs).
Faraway (2016) has GLMs but not survival analysis
Selvin (2001) provides worked-out examples of applications for a wide range of statistical analysis techniques. The Author is a retired UC Berkeley Biostatistics professor; he used it in a graduate-level biostat/epi course.
Jewell (2003) is by another UC Berkeley professor; it mostly covers logistic regression, with one chapter on survival analysis.
https://ucla-biostat-200c-2020spring.github.io/schedule/schedule.html provides course notes for “Biostat 200C - Methods in Biostatistics C” at UCLA, which is at the Biostatistics MS level.
https://online.stat.psu.edu/stat504/book/ provides course notes for “STAT 504 - Analysis of Discrete Data” at Penn State University. It includes logistic regression and Poisson regression, as well as 2-way tables and other related topics, and includes SAS code.
Nahhas (2024) is currently in-development
Clayton and Hills (2013) covers binary regression, count regression, and survival analysis. Haven’t started it yet.
https://thomaselove.github.io/2020-432-book/index.html is another set of lecture notes.
Woodward (2013) covers GLMs and survival; haven’t read it yet, but it looks comprehensive.
Roback and Legler (2021) is recent and uses the
tidyverse
; doesn’t appear to cover survival analysis.Wood (2017) is about generalized additive models but includes a detailed summary of GLMs.
Kutoyants (2023) appears to be a complete book on Poisson models.
Hardin and Hilbe (2018) uses Stata.
Andrews and Herzberg (2012) is a classic “learn-by-example” book with many datasets amenable to GLMs
Cannell and Livingston (2024) is another open-source, online textbook like this one; it is primarily about statistical programming, but it includes full chapters on linear regression, logistic regression, and Poisson regression. There is currently (2024/06) a placeholder chapter for survival analysis.
Gelman and Hill (2006) covers GLMs as well as hierarchical extensions of GLMs. No survival models?
Soch (2023) is a collection of proofs for results in probability, statistics, and related computational sciences.
Suárez et al. (2017) covers GLMs but not survival analysis
https://drive.google.com/file/d/1VwosGvHtRtKnC7P3ja7RAUawvvudgc9T/view
License
This book is licensed to you under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The code samples in this book are licensed under Creative Commons CC0 1.0 Universal (CC0 1.0), i.e. public domain.