Regression Models for Epidemiology

Author
Affiliation

Department of Public Health Sciences, School of Medicine, University of California, Davis

Published

Last modified: 2024-10-12: 12:18:41 (PM)

Preface

This web-book is derived from my lecture slides for Epidemiology 204: “Quantitative Epidemiology III: Statistical Models”, at UC Davis.

I have drawn these materials from many sources, including but not limited to:

Using these lecture notes

These lecture notes are available online at https://d-morrison.github.io/rme/. The online notes are searchable and are currently being iteratively updated1. A pdf version of the notes is also downloadable from https://d-morrison.github.io/rme/Regression-Models-for-Epidemiology.pdf, and the source files are available at https://github.com/d-morrison/rme.

Compiling chapters as lecture slide decks

Each chapter’s source file can also be compiled as a lecture slide deck, using the _quarto-revealjs.yml Quarto profile included in the git repository on Github.

For example, to compile Chapter 3  Models for Binary Outcomes as a slide deck:

  1. install quarto

  2. clone the project repository from Github

  3. Install the project dependencies using devtools:

library(devtools) # install from CRAN if needed
devtools::install_deps()
  1. Render the chapter using the revealjs profile using the following terminal shell command:
quarto render logistic-regression.qmd --profile=revealjs

You can also render all the chapters listed in the _quarto-revealjs.yml Quarto profile as slide decks simultaneously:

quarto render --profile=revealjs

Extracting LaTeX commands from the online version of the notes

If you want to extract the LaTeX commands for any math expressions in the online lecture notes, you should be able to right-click and get this pop-up menu:

Figure 1: Pop-up menu produced by right-clicking on math in online notes

If you select “TeX commands”, you will get a window with LaTeX code.2

Figure 2: LaTeX source code window

You can also grab the TeX commands from the quarto source files on github, but those files use custom macros (defined in https://github.com/d-morrison/rme/blob/main/macros.qmd), so it’s a little harder to reuse code from the source files.


Dark Mode

The online notes have two color palette themes: light and dark. You can toggle between them using the oval button near the top-left corner:

Figure 3: Palette toggle

Other resources

These notes represent my still-developing perspective on regression models in epidemiology. Many other statisticians and epidemiologists have published their own perspectives, and I encourage you to explore your many options and find ones that resonate with you. I have attempted to cite my sources throughout these notes.

Here are some additional resources that I’ve come across; I haven’t had time to read some of them thoroughly yet, but they’re all on my to-do list. I’ll add my thoughts on them over time.

  • Dobson and Barnett (2018) is a classic textbook on GLMs. It was used in UCLA Biostatistics’s MS-level GLMs course (Biostat 200C) when I took it, and it helped me a lot. It is fairly mathematically rigorous and concise, bordering on terse. It covers GLMs in detail, and survival analysis briefly, and it also has helpful chapters on Bayesian methods. I have adapted examples and explanations from it extensively in these notes.

  • Hosmer, Lemeshow, and Sturdivant (2013) is a classic text on logistic regression. I haven’t read it yet.

  • Agresti (2012) is another classic text for GLMs. I haven’t read it yet.

  • Agresti (2018) appears to be a more applied version of Agresti (2012). I haven’t read it yet. There are extra exercises and other resources available on the Student Companion Site

  • Agresti (2015) has “More than 400 exercises for readers to practice and extend the theory, methods, and data analysis”; might be more theoretical?

  • Agresti (2010) is specifically about ordinal data.

  • Dunn and Smyth (2018) is a recent textbook on GLMs. It doesn’t cover time-to-event models, and it doesn’t use the modern tidyverse packages (ggplot2, dplyr, etc.), but otherwise it seems great.

  • Moore (2016) is a recent textbook on survival analysis. It also doesn’t use the tidyverse, but otherwise seems great.

  • Klein and Moeschberger (2003) is a classic text for survival analysis. I read most of it in grad school, and it was very helpful. Examples and explanations from it are borrowed extensively in the second half of these notes (partially filtered through David Rocke’s course notes.)

  • Kalbfleisch and Prentice (2011) is another classic survival analysis text; I haven’t read it yet.

  • Kleinbaum and Klein (2010) is a mostly applied-level “self-learning” text for logistic regression; I read it cover-to-cover before grad school, and found it very helpful.

  • Kleinbaum and Klein (2012) is the corresponding “self-learning” text for survival analysis; I read it cover-to-cover before grad school, and found it very helpful.

  • Harrell (2015) is another popular textbook. It uses ggplot2 but not dplyr, and covers logistic regression and survival analysis (no Poisson or NB models?). An abbreviated but continuously updated version with audio clips is available at https://hbiostat.org/rmsc/.

  • Fox (2015) is another standard text. 3

  • McCullagh and Nelder (1989) is a classic, theoretical textbook on GLMs 4

  • Dalgaard (2008) covers GLMs and survival analysis at an applied level, using base R

  • Vittinghoff et al. (2012) covers GLMs, survival analysis, and causal inference, using Stata. The authors are UCSF professors, and it is used for the core Epi PhD courses there. I read this book nearly cover-to-cover before grad school, and it was hugely helpful for me, both for statistical modeling and for causal inference (I think it provided my first exposure to DAGs).

  • Faraway (2016) has GLMs but not survival analysis

  • Selvin (2001) provides worked-out examples of applications for a wide range of statistical analysis techniques. The Author is a retired UC Berkeley Biostatistics professor; he used it in a graduate-level biostat/epi course.

  • Jewell (2003) is by another UC Berkeley professor; it mostly covers logistic regression, with one chapter on survival analysis.

  • https://ucla-biostat-200c-2020spring.github.io/schedule/schedule.html provides course notes for “Biostat 200C - Methods in Biostatistics C” at UCLA, which is at the Biostatistics MS level.

  • https://online.stat.psu.edu/stat504/book/ provides course notes for “STAT 504 - Analysis of Discrete Data” at Penn State University. It includes logistic regression and Poisson regression, as well as 2-way tables and other related topics, and includes SAS code.

  • Nahhas (2024) is currently in-development

  • Clayton and Hills (2013) covers binary regression, count regression, and survival analysis. Haven’t started it yet.

  • https://thomaselove.github.io/2020-432-book/index.html is another set of lecture notes.

  • Woodward (2013) covers GLMs and survival; haven’t read it yet, but it looks comprehensive.

  • Roback and Legler (2021) is recent and uses the tidyverse; doesn’t appear to cover survival analysis.

  • Wood (2017) is about generalized additive models but includes a detailed summary of GLMs.

  • Kutoyants (2023) appears to be a complete book on Poisson models.

  • Hardin and Hilbe (2018) uses Stata.

  • Andrews and Herzberg (2012) is a classic “learn-by-example” book with many datasets amenable to GLMs

  • Cannell and Livingston (2024) is another open-source, online textbook like this one; it is primarily about statistical programming, but it includes full chapters on linear regression, logistic regression, and Poisson regression. There is currently (2024/06) a placeholder chapter for survival analysis.

  • Gelman and Hill (2006) covers GLMs as well as hierarchical extensions of GLMs. No survival models?

  • Soch (2023) is a collection of proofs for results in probability, statistics, and related computational sciences.

  • Suárez et al. (2017) covers GLMs but not survival analysis

  • https://drive.google.com/file/d/1VwosGvHtRtKnC7P3ja7RAUawvvudgc9T/view

License

This book is licensed to you under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

The code samples in this book are licensed under Creative Commons CC0 1.0 Universal (CC0 1.0), i.e. public domain.


  1. see the source file repository for recent changes: https://github.com/d-morrison/rme↩︎

  2. MathJax is more or less a dialect of LaTeX↩︎

  3. I don’t have anything to say about this book, because I haven’t opened it yet, but I’ve heard it’s great!↩︎

  4. haven’t opened it either↩︎