This chapter presents models for count data outcomes. With covariates, the event rate \(\lambda\) becomes a function of the covariates \(\tilde{X}= (X_1, \dots,X_n)\). Typically, count data models use a \(\text{log}\left\{\right\}\) link function, and thus an \(\text{exp}\left\{\right\}\) inverse-link function. That is:
In contrast with the \(X\)s, \(T\) enters this expression with a \(\text{log}\left\{\right\}\) transformation and without a corresponding \(\beta\) coefficient.
Note
Terms that enter the linear component of a model without a coefficient, such as \(\text{log}\left\{t\right\}\) here, are called offsets.
4.1.1 Rate ratios
Differences on the log-rate scale become ratios on the rate scale, because
Therefore, according to this model, differences of \(\delta\) in covariate \(x_j\) correspond to rate ratios of \(\text{exp}\left\{\beta_j \cdot \delta\right\}\).
That is, letting \(\tilde{X}_{-j}\) denote vector \(\tilde{X}\) with element \(j\) removed:
According to this model, if \(Z=1\), then \(Y\) will always be zero, regardless of \(X\) and \(T\):
\[P(Y=0|Z=1,X=x,T=t) = 1\]
Otherwise (if \(Z=0\)), \(Y\) will have a Poisson distribution, conditional on \(X\) and \(T\), as above.
Even though we never observe \(Z\), we can estimate the parameters \(\gamma_0\)-\(\gamma_p\), via maximum likelihood:
\[
\begin{aligned}
P(Y=y|X=x,T=t) &= P(Y=y,Z=1|...) + P(Y=y,Z=0|...)
\end{aligned}
\] (by the Law of Total Probability)
where \[
\begin{aligned}
P(Y=y,Z=z|...)
&= P(Y=y|Z=z,...)P(Z=z|...)
\end{aligned}
\]
Exercise 4.1 Expand \(P(Y=0|X=x,T=t)\), \(P(Y=1|X=x,T=t)\) and \(P(Y=y|X=x,T=t)\) into expressions involving \(P(Z=1|X=x,T=t)\) and \(P(Y=y|Z=0,X=x,T=t)\).
Exercise 4.2 Derive the expected value and variance of \(Y\), conditional on \(X\) and \(T\), as functions of \(P(Z=1|X=x,T=t)\) and \(\mathbb{E}[Y|Z=0,X=x,T=t]\).
4.6 Over-dispersion
4.6.1 Negative binomial models
The Poisson distribution model forces the variance to equal the mean. In practice, many count distributions will have a variance substantially larger than the mean (or occasionally smaller).
When we encounter this, we can try to reduce the residual variance by adding more covariates. However, there are also alternatives to the Poisson model.
We can still model \(\mu\) as a function of \(X\) and \(T\) as before, and we can combine this model with zero-inflation by using it in place of the Poisson distribution for \(\text{P}(Y=y|Z=0,X=x,T=t)\).
4.6.2 Quasipoisson
An alternative to Negative binomial is the “quasipoisson” distribution. I’ve never used it, but it seems to be a method-of-moments type approach rather than maximum likelihood. It models the variance as \(\text{Var}\left(Y\right) = \mu\theta\), and estimates \(\theta\) accordingly.
Table 4.5: Poisson versus Negative Binomial Regression coefficient estimates
Dobson, Annette J, and Adrian G Barnett. 2018. An Introduction to Generalized Linear Models. 4th ed. CRC press. https://doi.org/10.1201/9781315182780.
Vittinghoff, Eric, David V Glidden, Stephen C Shiboski, and Charles E McCulloch. 2012. Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models. 2nd ed. Springer. https://doi.org/10.1007/978-1-4614-1353-0.
---subtitle: "Poisson regression and variations"---# Models for Count Outcomes---## Acknowledgements {.unnumbered}This content is adapted from:- @dobson4e, Chapter 9- @vittinghoff2e, Chapter 8---{{< include shared-config.qmd >}}## Introduction::: notesThis chapter presents models for [count data](probability.qmd#sec-count-vars) outcomes.With covariates,the event rate $\lambda$ becomes a function of the covariates$\vX = (X_1, \dots,X_n)$.Typically, count data models usea $\log{}$ link function,and thus an $\exp{}$ inverse-link function. That is::::$$\begin{aligned}\Expp[Y | \vX = \vx, T = t] &= \mu(\vx,t)\\ \mu(\vx,t) &= \lambda(\vx)\cdot t\\ \lambda(\vx) &= \exp{\eta(\vx)}\\ \eta(\vx) &= \vx'\tilde \beta = \beta_0 + \beta_1 x_1 + \dots + \beta_p x_p\end{aligned}$$Therefore,$$\begin{aligned}\log{\{\Expp[Y | \vX = \vx,T=t]\}}&= \log{\{\mu(\vx)\}}\\&=\log{\{\lambda(\vx) \cdot t \}}\\&=\log{\lambda(\vx)} + \log{t}\\&=\log{\exp{\eta(\vx)}} + \log{t}\\&=\eta(\vx) + \log{t}\\&=\vx'\tilde\beta + \log{t}\\&=(\beta_0 +\beta_1 x_1+\dots + \beta_p x_p) + \log{t}\\\end{aligned}$$In contrast with the $X$s, $T$ enters this expression with a $\log{}$ transformation and without a corresponding $\beta$ coefficient. :::{.callout-note}Terms that enter the linear component of a model without a coefficient, such as $\log{t}$ here, are called **offsets**.:::### Rate ratios {.smaller}::: notesDifferences on the log-rate scale become ratios on the rate scale, because:::$$\exp{a-b} = \frac{\exp{a}}{\exp{b}}$$[math-preqs.qmd#cor-exp-sum]Therefore, according to this model, **differences of $\delta$ in covariate $x_j$ correspond to rate ratios of $\exp{\beta_j \cdot \delta}$**.That is, letting $\vX_{-j}$ denote vector $\vX$ with element $j$ removed: $$\begin{aligned}&{\left\{ \log{\Expp[Y |{\color{red}{X_j = a}}, \vX_{-j}=\vx_{-j},T=t]} \atop {-\log{\Expp[Y |{\color{red}{X_j = b}}, \vX_{-j}=\vx_{-j},T=t]}} \right\}}\\&= {\left\{\log{t} + \beta_0 + \beta_1 x_1 + ... + {\color{red}{\beta_j (a)}} + ...+\beta_p x_p \atop {-\log{t} + \beta_0 + \beta_1 x_1 + ... + {\color{red}{\beta_j (b)}} + ...+\beta_p x_p}\right\}}\\&= \color{red}{\beta_j(a-b)}\end{aligned}$$And accordingly,$$\begin{aligned}\frac{\mathbb{E}[Y |{\color{red}{X_j = a}}, \vX_{-j} = \vx_{-j}, T = t]}{\Expp[Y |{\color{red}{X_j = b}}, \vX_{-j}=\vx_{-j},T=t]}= \exp{{\color{red}{\beta_j(a-b)}}}\end{aligned}$$## Inference for count regression models### Confidence intervals for regression coefficients and rate ratiosAs usual:$$\beta \in \left[\ci \right]$$Rate ratios: exponentiate CI endpoints$$\exp{\beta} \in \left[\exp{\ci} \right]$$### Hypothesis tests for regression coefficients$$t = \frac{\hat \beta - \beta_0}{\hse{\hb}}$$Compare $t$ or $|t|$ to the tails of the standard Gaussian distribution, according to the null hypothesis.### Comparing nested modelslog(likelihood ratio) tests, as usual.## Prediction$$\begin{aligned}\hat y &\eqdef \hat{\Expp}[Y|\vX= \vx,T=t]\\&=\hat\mu(\vx, t)\\&=\hat\lambda(\vx) \cdot t\\&=\exp{\hat\eta(\vx)} \cdot t\\&=\exp{\vx'\hat{\boldsymbol{\beta}}} \cdot t\end{aligned}$$## Diagnostics### Residuals#### Observation residuals$$e \eqdef y - \hat y$$#### Pearson residuals$$r = \frac{e}{\hse{e}} \approx \frac{e}{\sqrt{\hat y}}$$#### Standardized Pearson residuals$$r_p = \frac{r}{\sqrt{1-h}}$$where $h$ is the "leverage" (which we will continue to leave undefined).#### Deviance residuals$$d_k = \text{sign}(y - \hat y)\left\{\sqrt{2[\ell_{\text{full}}(y) - \ell(\hat\beta; y)]}\right\}$$:::{.callout-note}$$\text{sign}(x) \eqdef \frac{x}{|x|}$$In other words:* $\text{sign}(x) = -1$ if $x < 0$* $\text{sign}(x) = 0$ if $x = 0$* $\text{sign}(x) = 1$ if $x > 0$::::{.content-hidden}```{r}plot(sign,xlim =c(-1,1), xlab ="x", ylab ="sign(x)")```:::::::## Zero-inflation### Models for zero-inflated countsWe assume a latent (unobserved) binary variable, $Z$, which we model using logistic regression:$$P(Z=1|X=x) = \pi(x) = \expit(\gamma_0 + \gamma_1 x_1 +...)$$According to this model, if $Z=1$, then $Y$ will always be zero, regardless of $X$ and $T$:$$P(Y=0|Z=1,X=x,T=t) = 1$$Otherwise (if $Z=0$), $Y$ will have a Poisson distribution, conditional on $X$ and $T$, as above.Even though we never observe $Z$, we can estimate the parameters $\gamma_0$-$\gamma_p$, via maximum likelihood:$$\begin{aligned}P(Y=y|X=x,T=t) &= P(Y=y,Z=1|...) + P(Y=y,Z=0|...)\end{aligned}$$(by the Law of Total Probability)where $$\begin{aligned}P(Y=y,Z=z|...) &= P(Y=y|Z=z,...)P(Z=z|...)\end{aligned}$$---::: {#exr-zinf-pmf}Expand $P(Y=0|X=x,T=t)$, $P(Y=1|X=x,T=t)$ and $P(Y=y|X=x,T=t)$ into expressions involving $P(Z=1|X=x,T=t)$ and $P(Y=y|Z=0,X=x,T=t)$.:::---::: {#exr-zinf-moments}Derive the expected value and variance of $Y$, conditional on $X$ and $T$, as functions of $P(Z=1|X=x,T=t)$ and $\Expp[Y|Z=0,X=x,T=t]$.:::## Over-dispersion### Negative binomial models::: notesThe Poisson distribution model **forces** the variance to equal the mean. In practice, many count distributions will have a variance substantially larger than the mean (or occasionally smaller).When we encounter this, we can try to reduce the residual variance by adding more covariates. However, there are also alternatives to the Poisson model.Most notably, the [negative binomial model](probability.qmd#sec-nb-dist).:::We can still model $\mu$ as a function of $X$ and $T$ as before, and we can combine this model with zero-inflation by using it in place of the Poisson distribution for $\P(Y=y|Z=0,X=x,T=t)$.### QuasipoissonAn alternative to Negative binomial is the "quasipoisson" distribution. I've never used it, but it seems to be a method-of-moments type approach rather than maximum likelihood. It models the variance as $\Var{Y} = \mu\theta$, and estimates $\theta$ accordingly.See `?quasipoisson` in R for more.## Example: needle-sharing(adapted from @vittinghoff2e, §8){{< include exr-needle-sharing.qmd >}}