This chapter presents models for count data outcomes. With covariates, the event rate \(\lambda\) becomes a function of the covariates \(\tilde{X}= (X_1, \dots,X_n)\). Typically, count data models use a \(\text{log}\left\{\right\}\) link function, and thus an \(\text{exp}\left\{\right\}\) inverse-link function. That is:
In contrast with the other covariates (represented by \(\tilde{X}\)), \(T\) enters this expression with a \(\text{log}\left\{\right\}\) transformation and without a corresponding \(\beta\) coefficient.
Note
Terms that enter the linear component of a model without a coefficient, such as \(\text{log}\left\{t\right\}\) here, are called offsets.
4.1.1 Rate ratios
Differences on the log-rate scale become ratios on the rate scale, because
Therefore, according to this model, differences of \(\delta\) in covariate \(x_j\) correspond to rate ratios of \(\text{exp}\left\{\beta_j \cdot \delta\right\}\).
That is, letting \(\tilde{X}_{-j}\) denote vector \(\tilde{X}\) with element \(j\) removed:
According to this model, if \(Z=1\), then \(Y\) will always be zero, regardless of \(X\) and \(T\):
\[P(Y=0|Z=1,X=x,T=t) = 1\]
Otherwise (if \(Z=0\)), \(Y\) will have a Poisson distribution, conditional on \(X\) and \(T\), as above.
Even though we never observe \(Z\), we can estimate the parameters \(\gamma_0\)-\(\gamma_p\), via maximum likelihood:
\[
\begin{aligned}
P(Y=y|X=x,T=t) &= P(Y=y,Z=1|...) + P(Y=y,Z=0|...)
\end{aligned}
\] (by the Law of Total Probability)
where \[
\begin{aligned}
P(Y=y,Z=z|...)
&= P(Y=y|Z=z,...)P(Z=z|...)
\end{aligned}
\]
Exercise 4.1 Expand \(P(Y=0|X=x,T=t)\), \(P(Y=1|X=x,T=t)\) and \(P(Y=y|X=x,T=t)\) into expressions involving \(P(Z=1|X=x,T=t)\) and \(P(Y=y|Z=0,X=x,T=t)\).
Exercise 4.2 Derive the expected value and variance of \(Y\), conditional on \(X\) and \(T\), as functions of \(P(Z=1|X=x,T=t)\) and \(\mathbb{E}[Y|Z=0,X=x,T=t]\).
4.6 Over-dispersion
The Poisson distribution model forces the variance to equal the mean. In practice, many count distributions will have a variance substantially larger than the mean (or occasionally, smaller).
Definition 4.1 (Overdispersion) A random variable \(X\) is overdispersed relative to a model \(\text{p}(X=x)\) if if its empirical variance in a dataset is larger than the value is predicted by the fitted model \(\hat{\text{p}}(X=x)\).
We can still model \(\mu\) as a function of \(X\) and \(T\) as before, and we can combine this model with zero-inflation (as the conditional distribution for the non-zero component).
4.6.2 Quasipoisson
An alternative to Negative binomial is the “quasipoisson” distribution. I’ve never used it, but it seems to be a method-of-moments type approach rather than maximum likelihood. It models the variance as \(\text{Var}\left(Y\right) = \mu\theta\), and estimates \(\theta\) accordingly.
Table 4.6: Poisson versus Negative Binomial Regression coefficient estimates
zero-inflation
Show R code
library(glmmTMB)zinf_fit1=glmmTMB( family ="poisson", data =needles, formula =shared_syr~age+sex+homeless*polydrug, ziformula =~age+sex+homeless+polydrug# fit won't converge with interaction)zinf_fit1|>parameters(exponentiate =TRUE)|>print_md()
Another R package for zero-inflated models is pscl (Zeileis, Kleiber, and Jackman (2008)).
zero-inflated negative binomial model
Show R code
library(glmmTMB)zinf_fit1=glmmTMB( family =nbinom2, data =needles, formula =shared_syr~age+sex+homeless*polydrug, ziformula =~age+sex+homeless+polydrug# fit won't converge with interaction)zinf_fit1|>parameters(exponentiate =TRUE)|>print_md()
Dobson, Annette J, and Adrian G Barnett. 2018. An Introduction to Generalized Linear Models. 4th ed. CRC press. https://doi.org/10.1201/9781315182780.
Vittinghoff, Eric, David V Glidden, Stephen C Shiboski, and Charles E McCulloch. 2012. Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models. 2nd ed. Springer. https://doi.org/10.1007/978-1-4614-1353-0.
Zeileis, Achim, Christian Kleiber, and Simon Jackman. 2008. “Regression Models for Count Data in R.”Journal of Statistical Software 27 (8). https://www.jstatsoft.org/v27/i08/.
---subtitle: "Poisson regression and variations"---# Models for Count Outcomes---## Acknowledgements {.unnumbered}This content is adapted from:- @dobson4e, Chapter 9- @vittinghoff2e, Chapter 8---{{< include shared-config.qmd >}}## Introduction::: notesThis chapter presents models for [count data](probability.qmd#sec-count-vars) outcomes.With covariates,the event rate $\lambda$ becomes a function of the covariates$\vX = (X_1, \dots,X_n)$.Typically, count data models usea $\log{}$ link function,and thus an $\exp{}$ inverse-link function. That is::::$$\begin{aligned}\Expp[Y | \vX = \vx, T = t] &= \mu(\vx,t)\\ \mu(\vx,t) &= \lambda(\vx)\cdot t\\ \lambda(\vx) &= \exp{\eta(\vx)}\\ \eta(\vx) &= \vx'\tilde \beta = \beta_0 + \beta_1 x_1 + \dots + \beta_p x_p\end{aligned}$$Therefore,$$\begin{aligned}\logf{\Expp[Y | \vX = \vx,T=t]}&= \logf{\mu(\vx)}\\&=\logf{\lambda(\vx) \cdot t}\\&=\logf{\lambda(\vx)} + \log{t}\\&=\logf{\exp{\eta(\vx)}} + \log{t}\\&=\eta(\vx) + \log{t}\\&=\vx'\tilde\beta + \log{t}\\&=(\beta_0 +\beta_1 x_1+\dots + \beta_p x_p) + \log{t}\\\end{aligned}$$In contrast with the other covariates (represented by $\vX$), $T$ enters this expression with a $\log{}$ transformation and without a corresponding $\beta$ coefficient. :::{.callout-note}Terms that enter the linear component of a model without a coefficient, such as $\log{t}$ here, are called **offsets**.:::### Rate ratios {.smaller}::: notesDifferences on the log-rate scale become ratios on the rate scale, because:::$$\exp{a-b} = \frac{\exp{a}}{\exp{b}}$$(recall from [Algebra 2](math-prereqs.qmd#cor-exp-sum))Therefore, according to this model, **differences of $\delta$ in covariate $x_j$ correspond to rate ratios of $\exp{\beta_j \cdot \delta}$**.That is, letting $\vX_{-j}$ denote vector $\vX$ with element $j$ removed: $$\begin{aligned}&{\left\{ \log{\Expp[Y |{\color{red}{X_j = a}}, \vX_{-j}=\vx_{-j},T=t]} \atop {-\log{\Expp[Y |{\color{red}{X_j = b}}, \vX_{-j}=\vx_{-j},T=t]}} \right\}}\\&= {\left\{\log{t} + \beta_0 + \beta_1 x_1 + ... + {\color{red}{\beta_j (a)}} + ...+\beta_p x_p \atop {-\log{t} + \beta_0 + \beta_1 x_1 + ... + {\color{red}{\beta_j (b)}} + ...+\beta_p x_p}\right\}}\\&= \color{red}{\beta_j(a-b)}\end{aligned}$$And accordingly,$$\begin{aligned}\frac{\mathbb{E}[Y |{\color{red}{X_j = a}}, \vX_{-j} = \vx_{-j}, T = t]}{\Expp[Y |{\color{red}{X_j = b}}, \vX_{-j}=\vx_{-j},T=t]}= \exp{{\color{red}{\beta_j(a-b)}}}\end{aligned}$$## Inference for count regression models### Confidence intervals for regression coefficients and rate ratiosAs usual:$$\beta \in \left[\ci \right]$$Rate ratios: exponentiate CI endpoints$$\exp{\beta} \in \left[\exp{\ci} \right]$$### Hypothesis tests for regression coefficients$$t = \frac{\hat \beta - \beta_0}{\hse{\hb}}$$Compare $t$ or $|t|$ to the tails of the standard Gaussian distribution, according to the null hypothesis.### Comparing nested modelslog(likelihood ratio) tests, as usual.## Prediction$$\begin{aligned}\hat y &\eqdef \hat{\Expp}[Y|\vX= \vx,T=t]\\&=\hat\mu(\vx, t)\\&=\hat\lambda(\vx) \cdot t\\&=\exp{\hat\eta(\vx)} \cdot t\\&=\exp{\vx'\hat{\boldsymbol{\beta}}} \cdot t\end{aligned}$$## Diagnostics### Residuals#### Observation residuals$$e \eqdef y - \hat y$$#### Pearson residuals$$r = \frac{e}{\hse{e}} \approx \frac{e}{\sqrt{\hat y}}$$#### Standardized Pearson residuals$$r_p = \frac{r}{\sqrt{1-h}}$$where $h$ is the "leverage" (which we will continue to leave undefined).#### Deviance residuals$$d_k = \text{sign}(y - \hat y)\left\{\sqrt{2[\ell_{\text{full}}(y) - \ell(\hat\beta; y)]}\right\}$$:::{.callout-note}$$\text{sign}(x) \eqdef \frac{x}{|x|}$$In other words:* $\text{sign}(x) = -1$ if $x < 0$* $\text{sign}(x) = 0$ if $x = 0$* $\text{sign}(x) = 1$ if $x > 0$::::{.content-hidden}```{r}plot(sign,xlim =c(-1,1), xlab ="x", ylab ="sign(x)")```:::::::## Zero-inflation### Models for zero-inflated countsWe assume a latent (unobserved) binary variable, $Z$, which we model using logistic regression:$$P(Z=1|X=x) = \pi(x) = \expit(\gamma_0 + \gamma_1 x_1 +...)$$According to this model, if $Z=1$, then $Y$ will always be zero, regardless of $X$ and $T$:$$P(Y=0|Z=1,X=x,T=t) = 1$$Otherwise (if $Z=0$), $Y$ will have a Poisson distribution, conditional on $X$ and $T$, as above.Even though we never observe $Z$, we can estimate the parameters $\gamma_0$-$\gamma_p$, via maximum likelihood:$$\begin{aligned}P(Y=y|X=x,T=t) &= P(Y=y,Z=1|...) + P(Y=y,Z=0|...)\end{aligned}$$(by the Law of Total Probability)where $$\begin{aligned}P(Y=y,Z=z|...) &= P(Y=y|Z=z,...)P(Z=z|...)\end{aligned}$$---::: {#exr-zinf-pmf}Expand $P(Y=0|X=x,T=t)$, $P(Y=1|X=x,T=t)$ and $P(Y=y|X=x,T=t)$ into expressions involving $P(Z=1|X=x,T=t)$ and $P(Y=y|Z=0,X=x,T=t)$.:::---::: {#exr-zinf-moments}Derive the expected value and variance of $Y$, conditional on $X$ and $T$, as functions of $P(Z=1|X=x,T=t)$ and $\Expp[Y|Z=0,X=x,T=t]$.:::## Over-dispersion---::: notesThe Poisson distribution model **forces** the variance to equal the mean. In practice, many count distributions will have a variance substantially larger than the mean (or occasionally, smaller).::::::: {#def-overdispersion}#### OverdispersionA random variable $X$ is **overdispersed** relative to a model $\p(X=x)$ ifif its empirical variance in a dataset is larger than the value is predicted by the fitted model $\hat{\p}(X=x)$.::::::: notesc.f. @dobson4e §3.2.1, 7.7, 9.8; @vittinghoff2e §8.1.5; and <https://en.wikipedia.org/wiki/Overdispersion>.When we encounter overdispersion, we can try to reduce the residual variance by adding more covariates. :::### Negative binomial modelsThere are alternatives to the Poisson model.Most notably, the [negative binomial model](probability.qmd#sec-nb-dist).We can still model $\mu$ as a function of $X$ and $T$ as before, and we can combine this model with zero-inflation (as the conditional distribution for the non-zero component).### QuasipoissonAn alternative to Negative binomial is the "quasipoisson" distribution. I've never used it, but it seems to be a method-of-moments type approach rather than maximum likelihood. It models the variance as $\Var{Y} = \mu\theta$, and estimates $\theta$ accordingly.See `?quasipoisson` in R for more.## Example: needle-sharing(adapted from @vittinghoff2e, §8){{< include exr-needle-sharing.qmd >}}## More on count regression- <https://bookdown.org/roback/bookdown-BeyondMLR/ch-poissonreg.html>