4  Models for Count Outcomes

Poisson regression and variations

Published

Last modified: 2024-05-16: 19:54:50 (PM)


Acknowledgements

This content is adapted from:

  • Dobson and Barnett (2018), Chapter 9
  • Vittinghoff et al. (2012), Chapter 8


Configuring R

Functions from these packages will be used throughout this document:

Show R code
library(conflicted) # check for conflicting function definitions
# library(printr) # inserts help-file output into markdown output
library(rmarkdown) # Convert R Markdown documents into a variety of formats.
library(pander) # format tables for markdown
library(ggplot2) # graphics
library(ggeasy) # help with graphics
library(ggfortify) # help with graphics
library(dplyr) # manipulate data
library(tibble) # `tibble`s extend `data.frame`s
library(magrittr) # `%>%` and other additional piping tools
library(haven) # import Stata files
library(knitr) # format R output for markdown
library(tidyr) # Tools to help to create tidy data
library(plotly) # interactive graphics
library(dobson) # datasets from Dobson and Barnett 2018
library(parameters) # format model output tables for markdown
library(haven) # import Stata files
library(latex2exp) # use LaTeX in R code (for figures and tables)
library(fs) # filesystem path manipulations
library(survival) # survival analysis
library(survminer) # survival analysis graphics
library(KMsurv) # datasets from Klein and Moeschberger
library(parameters) # format model output tables for
library(webshot2) # convert interactive content to static for pdf
library(forcats) # functions for categorical variables ("factors")
library(stringr) # functions for dealing with strings
library(lubridate) # functions for dealing with dates and times

Here are some R settings I use in this document:

Show R code
rm(list = ls()) # delete any data that's already loaded into R

conflicts_prefer(dplyr::filter)
ggplot2::theme_set(
  ggplot2::theme_bw() + 
        # ggplot2::labs(col = "") +
    ggplot2::theme(
      legend.position = "bottom",
      text = ggplot2::element_text(size = 12, family = "serif")))

knitr::opts_chunk$set(message = FALSE)
options('digits' = 4)

panderOptions("big.mark", ",")
pander::panderOptions("table.emphasize.rownames", FALSE)
pander::panderOptions("table.split.table", Inf)
conflicts_prefer(dplyr::filter) # use the `filter()` function from dplyr() by default
legend_text_size = 9

4.1 Introduction

This chapter presents models for count data outcomes. With covariates, the event rate \(\lambda\) becomes a function of the covariates \(\tilde{X}= (X_1, \dots,X_n)\). Typically, count data models use a \(\text{log}\left\{\right\}\) link function, and thus an \(\text{exp}\left\{\right\}\) inverse-link function. That is:

\[ \begin{aligned} \mathbb{E}[Y | \tilde{X}= \tilde{x}, T = t] &= \mu(\tilde{x},t) \\ \mu(\tilde{x},t) &= \lambda(\tilde{x})\cdot t \\ \lambda(\tilde{x}) &= \text{exp}\left\{\eta(\tilde{x})\right\} \\ \eta(\tilde{x}) &= \tilde{x}'\tilde \beta = \beta_0 + \beta_1 x_1 + \dots + \beta_p x_p \end{aligned} \]

Therefore, \[ \begin{aligned} \text{log}\left\{\{\mathbb{E}[Y | \tilde{X}= \tilde{x},T=t] \}\right\} &= \text{log}\left\{\{\mu(\tilde{x})\}\right\}\\ &=\text{log}\left\{\{\lambda(\tilde{x}) \cdot t \}\right\}\\ &=\text{log}\left\{\lambda(\tilde{x})\right\} + \text{log}\left\{t\right\}\\ &=\text{log}\left\{\text{exp}\left\{\eta(\tilde{x})\right\}\right\} + \text{log}\left\{t\right\}\\ &=\eta(\tilde{x}) + \text{log}\left\{t\right\}\\ &=\tilde{x}'\tilde\beta + \text{log}\left\{t\right\}\\ &=(\beta_0 +\beta_1 x_1+\dots + \beta_p x_p) + \text{log}\left\{t\right\}\\ \end{aligned} \]

In contrast with the \(X\)s, \(T\) enters this expression with a \(\text{log}\left\{\right\}\) transformation and without a corresponding \(\beta\) coefficient.

Note

Terms that enter the linear component of a model without a coefficient, such as \(\text{log}\left\{t\right\}\) here, are called offsets.

4.1.1 Rate ratios

Differences on the log-rate scale become ratios on the rate scale, because

\[\text{exp}\left\{a-b\right\} = \frac{\text{exp}\left\{a\right\}}{\text{exp}\left\{b\right\}}\]

[math-preqs.qmd#cor-exp-sum]

Therefore, according to this model, differences of \(\delta\) in covariate \(x_j\) correspond to rate ratios of \(\text{exp}\left\{\beta_j \cdot \delta\right\}\).

That is, letting \(\tilde{X}_{-j}\) denote vector \(\tilde{X}\) with element \(j\) removed:

\[ \begin{aligned} &{ \left\{ \text{log}\left\{\mathbb{E}[Y |{\color{red}{X_j = a}}, \tilde{X}_{-j}=\tilde{x}_{-j},T=t]\right\} \atop {-\text{log}\left\{\mathbb{E}[Y |{\color{red}{X_j = b}}, \tilde{X}_{-j}=\tilde{x}_{-j},T=t]\right\}} \right\} }\\ &= {\left\{ \text{log}\left\{t\right\} + \beta_0 + \beta_1 x_1 + ... + {\color{red}{\beta_j (a)}} + ...+\beta_p x_p \atop {-\text{log}\left\{t\right\} + \beta_0 + \beta_1 x_1 + ... + {\color{red}{\beta_j (b)}} + ...+\beta_p x_p} \right\}}\\ &= \color{red}{\beta_j(a-b)} \end{aligned} \]

And accordingly,

\[ \begin{aligned} \frac {\mathbb{E}[Y |{\color{red}{X_j = a}}, \tilde{X}_{-j} = \tilde{x}_{-j}, T = t] } { \mathbb{E}[Y |{\color{red}{X_j = b}}, \tilde{X}_{-j}=\tilde{x}_{-j},T=t] } = \text{exp}\left\{{\color{red}{\beta_j(a-b)}}\right\} \end{aligned} \]

4.2 Inference for count regression models

4.2.1 Confidence intervals for regression coefficients and rate ratios

As usual:

\[ \beta \in \left[\hat \beta{\color{red}\pm} z_{1 - \frac{\alpha}{2}}\cdot \hat{\text{se}}\left(\hat \beta\right)\right] \]

Rate ratios: exponentiate CI endpoints

\[ \text{exp}\left\{\beta\right\} \in \left[\text{exp}\left\{\hat \beta{\color{red}\pm} z_{1 - \frac{\alpha}{2}}\cdot \hat{\text{se}}\left(\hat \beta\right)\right\} \right] \]

4.2.2 Hypothesis tests for regression coefficients

\[ t = \frac{\hat \beta - \beta_0}{\hat{\text{se}}\left(\hat \beta\right)} \]

Compare \(t\) or \(|t|\) to the tails of the standard Gaussian distribution, according to the null hypothesis.

4.2.3 Comparing nested models

log(likelihood ratio) tests, as usual.

4.3 Prediction

\[ \begin{aligned} \hat y &\stackrel{\text{def}}{=}\hat{\mathbb{E}}[Y|\tilde{X}= \tilde{x},T=t]\\ &=\hat\mu(\tilde{x}, t)\\ &=\hat\lambda(\tilde{x}) \cdot t\\ &=\text{exp}\left\{\hat\eta(\tilde{x})\right\} \cdot t\\ &=\text{exp}\left\{\tilde{x}'\hat{\boldsymbol{\beta}}\right\} \cdot t \end{aligned} \]

4.4 Diagnostics

4.4.1 Residuals

Observation residuals

\[e \stackrel{\text{def}}{=}y - \hat y\]

Pearson residuals

\[r = \frac{e}{\hat{\text{se}}\left(e\right)} \approx \frac{e}{\sqrt{\hat y}}\]

Standardized Pearson residuals

\[r_p = \frac{r}{\sqrt{1-h}}\] where \(h\) is the “leverage” (which we will continue to leave undefined).

Deviance residuals

\[ d_k = \text{sign}(y - \hat y)\left\{\sqrt{2[\ell_{\text{full}}(y) - \ell(\hat\beta; y)]}\right\} \]

Note

\[\text{sign}(x) \stackrel{\text{def}}{=}\frac{x}{|x|}\] In other words:

  • \(\text{sign}(x) = -1\) if \(x < 0\)
  • \(\text{sign}(x) = 0\) if \(x = 0\)
  • \(\text{sign}(x) = 1\) if \(x > 0\)

4.5 Zero-inflation

4.5.1 Models for zero-inflated counts

We assume a latent (unobserved) binary variable, \(Z\), which we model using logistic regression:

\[P(Z=1|X=x) = \pi(x) = \text{expit}(\gamma_0 + \gamma_1 x_1 +...)\]

According to this model, if \(Z=1\), then \(Y\) will always be zero, regardless of \(X\) and \(T\):

\[P(Y=0|Z=1,X=x,T=t) = 1\]

Otherwise (if \(Z=0\)), \(Y\) will have a Poisson distribution, conditional on \(X\) and \(T\), as above.

Even though we never observe \(Z\), we can estimate the parameters \(\gamma_0\)-\(\gamma_p\), via maximum likelihood:

\[ \begin{aligned} P(Y=y|X=x,T=t) &= P(Y=y,Z=1|...) + P(Y=y,Z=0|...) \end{aligned} \] (by the Law of Total Probability)

where \[ \begin{aligned} P(Y=y,Z=z|...) &= P(Y=y|Z=z,...)P(Z=z|...) \end{aligned} \]


Exercise 4.1 Expand \(P(Y=0|X=x,T=t)\), \(P(Y=1|X=x,T=t)\) and \(P(Y=y|X=x,T=t)\) into expressions involving \(P(Z=1|X=x,T=t)\) and \(P(Y=y|Z=0,X=x,T=t)\).


Exercise 4.2 Derive the expected value and variance of \(Y\), conditional on \(X\) and \(T\), as functions of \(P(Z=1|X=x,T=t)\) and \(\mathbb{E}[Y|Z=0,X=x,T=t]\).

4.6 Over-dispersion

4.6.1 Negative binomial models

The Poisson distribution model forces the variance to equal the mean. In practice, many count distributions will have a variance substantially larger than the mean (or occasionally smaller).

When we encounter this, we can try to reduce the residual variance by adding more covariates. However, there are also alternatives to the Poisson model.

Most notably, the negative binomial model.

We can still model \(\mu\) as a function of \(X\) and \(T\) as before, and we can combine this model with zero-inflation by using it in place of the Poisson distribution for \(\text{P}(Y=y|Z=0,X=x,T=t)\).

4.6.2 Quasipoisson

An alternative to Negative binomial is the “quasipoisson” distribution. I’ve never used it, but it seems to be a method-of-moments type approach rather than maximum likelihood. It models the variance as \(\text{Var}\left(Y\right) = \mu\theta\), and estimates \(\theta\) accordingly.

See ?quasipoisson in R for more.

4.7 Example: needle-sharing

(adapted from Vittinghoff et al. (2012), §8)

Show R code
library(tidyverse)
library(haven)
needles = read_dta("inst/extdata/needle_sharing.dta") |> 
  as_tibble() |> 
  mutate(polydrug = 
           ifelse(polydrug, "multiple drugs used", "one drug used") |> 
           factor() |> 
           relevel(ref = "one drug used"),
         homeless = ifelse(homeless, "homeless", "not homeless") |> 
           factor() |> relevel(ref = "not homeless"))
needles
Table 4.1: Needle-sharing data

Show R code
library(ggplot2)

needles |> 
  ggplot(
    aes(
      x = age,
      y = shared_syr,
      col = sex,
      shape = ethn
    )
  ) + 
  geom_point(
    size = 4, 
    alpha = .5) +
  facet_grid(
    cols = vars(polydrug), 
    rows = vars(homeless)) +
  theme(legend.position = "bottom")
Figure 4.1: Rates of needle sharing

4.8 model

Show R code
glm1 = glm(
  data = needles,
  family = stats::poisson,
  shared_syr ~ age + sex + homeless*polydrug
)

library(parameters)
glm1 |> parameters(exponentiate = TRUE) |> 
  print_md()
Table 4.2: Poisson model for needle-sharing data
Parameter IRR SE 95% CI z p
(Intercept) 8.95 2.26 (5.46, 14.68) 8.69 < .001
age 0.97 5.58e-03 (0.96, 0.98) -5.41 < .001
sex (M) 0.50 0.06 (0.40, 0.63) -5.88 < .001
sex (Trans) 7.99e-08 1.02e-04 (0.00, Inf) -0.01 0.990
homeless (homeless) 3.58 0.45 (2.79, 4.59) 10.06 < .001
polydrug (multiple drugs used) 1.45e-07 5.82e-05 (0.00, Inf) -0.04 0.969
homeless (homeless) × polydrug (multiple drugs used) 1.27e+06 5.12e+08 (0.00, Inf) 0.03 0.972
Show R code
Table 4.3: Diagnostics for Poisson model

Table 4.4: Negative binomial model for needle-sharing data
Show R code
library(MASS) #need this for glm.nb()
glm1.nb = glm.nb(
  data = needles,
  shared_syr ~ age + sex + homeless*polydrug
)
summary(glm1.nb)
#> 
#> Call:
#> glm.nb(formula = shared_syr ~ age + sex + homeless * polydrug, 
#>     data = needles, init.theta = 0.08436295825, link = log)
#> 
#> Coefficients:
#>                                                 Estimate  Std. Error z value
#> (Intercept)                                   2.0557e+00  1.7133e+00  1.1999
#> age                                          -2.7630e-02  3.8153e-02 -0.7242
#> sexM                                         -1.0648e+00  8.0684e-01 -1.3197
#> sexTrans                                     -2.5336e+01  1.1484e+05 -0.0002
#> homelesshomeless                              1.6546e+00  7.2207e-01  2.2916
#> polydrugmultiple drugs used                  -2.4562e+01  3.6055e+04 -0.0007
#> homelesshomeless:polydrugmultiple drugs used  2.3178e+01  3.6055e+04  0.0006
#>                                              Pr(>|z|)  
#> (Intercept)                                   0.23019  
#> age                                           0.46895  
#> sexM                                          0.18694  
#> sexTrans                                      0.99982  
#> homelesshomeless                              0.02193 *
#> polydrugmultiple drugs used                   0.99946  
#> homelesshomeless:polydrugmultiple drugs used  0.99949  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for Negative Binomial(0.0844) family taken to be 1)
#> 
#>     Null deviance: 69.8027  on 120  degrees of freedom
#> Residual deviance: 57.7820  on 114  degrees of freedom
#>   (7 observations deleted due to missingness)
#> AIC: 317.506
#> 
#> Number of Fisher Scoring iterations: 1
#> 
#> 
#>               Theta:  0.0844 
#>           Std. Err.:  0.0197 
#> 
#>  2 x log-likelihood:  -301.5060
Show R code
tibble(name = names(coef(glm1)), poisson = coef(glm1), nb = coef(glm1.nb))
Table 4.5: Poisson versus Negative Binomial Regression coefficient estimates