---
title: "Example Bayesian Analyses"
format:
html: default
revealjs:
output-file: bayesian-examples-slides.html
pdf:
output-file: bayesian-examples-handout.pdf
docx:
output-file: bayesian-examples-handout.docx
---
{{< include shared-config.qmd >}}
This appendix collects worked Bayesian analyses,
applying the concepts of
[the Bayesian-analysis appendix](intro-bayes.qmd#sec-bayes-paradigms)
and the simulation machinery of
[the MCMC appendix](mcmc-methods.qmd#sec-mcmc-methods)
to regression models we have already met in the frequentist setting.
The structure parallels [@dobson4e, Chapter 14].
Each analysis specifies a model, places priors on its parameters,
draws posterior samples by MCMC, and summarizes the posterior.
# A first example: a single proportion {#sec-bayes-examples}
We begin with the simplest possible model — a single Bernoulli
probability — fit with **JAGS** ("Just Another Gibbs Sampler"),
which we drive from R.
This example introduces the mechanics (specifying a model, supplying
data, monitoring parameters, and inspecting draws) that the later
analyses reuse.
{{< include dobson-bayes-example.qmd >}}
# Binary outcomes: logistic regression {#sec-bayes-examples-logistic}
For a binary outcome $Y_i \sim \Bernoulli(\pi_i)$ with
$\logit(\pi_i) = \tp{\vxi}\vbeta$,
the Bayesian analysis places a prior on the coefficient vector
$\vbeta$ — typically independent, weakly informative normals such as
$\beta_j \sim \ndist{0, \sigma_0^2}$ with $\sigma_0$ large on the log-odds
scale — and samples the posterior $p(\vbeta \mid \vy)$ by MCMC
[@dobson4e, Chapter 14, p. 318].
The posterior draws for each $\beta_j$ are summarized by their mean and a
$95\%$ credible interval; exponentiating gives posterior summaries for
the odds ratios.
Unlike the frequentist fit, no large-sample normal approximation is
needed: the credible interval is read directly from the posterior
quantiles, and odds ratios or other nonlinear functions of
$\vbeta$ are obtained simply by transforming the draws.
:::{#exm-bayes-logistic}
#### Bayesian logistic regression
We simulate $N = 200$ observations from a logistic model with a single
continuous predictor, intercept $\beta_0 = -0.5$ and slope
$\beta_1 = 1.2$, then recover the coefficients by MCMC.
```{r}
#| code-fold: show
#| message: false
#| warning: false
set.seed(2024)
N <- 200L
x <- rnorm(N)
beta0_true <- -0.5
beta1_true <- 1.2
y <- rbinom(N, size = 1, prob = plogis(beta0_true + beta1_true * x))
logistic_model <- "model {
for (i in 1:N) {
y[i] ~ dbern(pi[i])
logit(pi[i]) <- beta0 + beta1 * x[i]
}
beta0 ~ dnorm(0, 0.01) # weakly informative: precision 0.01 = sd 10
beta1 ~ dnorm(0, 0.01)
OR <- exp(beta1) # odds ratio per unit increase in x
}"
jags_logistic <- run.jags(
model = logistic_model,
data = list(y = y, x = x, N = N),
monitor = c("beta0", "beta1", "OR"),
n.chains = 2, inits = initsfunction,
burnin = 1000, sample = 4000, method = "rjags"
)
summary(jags_logistic)[, c("Mean", "Lower95", "Upper95", "psrf")] |> round(3)
```
The posterior means recover $\beta_0 \approx -0.5$ and
$\beta_1 \approx 1.2$, each $95\%$ credible interval covers its true
value, and the convergence statistic `psrf` ($\hR$) is near $1$.
The odds-ratio summary comes for free by monitoring $\exp(\beta_1)$.
:::
# Survival analysis {#sec-bayes-examples-survival}
Parametric survival models (for example, exponential or Weibull event
times) admit a Bayesian treatment in which priors are placed on the
baseline-hazard parameters and the regression coefficients, and the
posterior is sampled by MCMC [@dobson4e, Chapter 14, p. 330].
Censoring is handled inside the model through the likelihood: an event
contributes its density $\haz(y)\,\surv(y)$ and a censored observation
contributes its survival probability $\surv(y)$, exactly as in the
frequentist [right-censored likelihood](intro-to-survival-analysis.qmd#likelihood-with-censoring).
:::{#exm-bayes-survival}
#### Bayesian exponential survival regression
We simulate right-censored exponential survival times with a binary
covariate (say, treatment) whose true log hazard ratio is
$\beta_1 = -0.7$, and fit the model with the **zeros trick**, which lets
us write the exact censored log-likelihood directly in JAGS.
```{r}
#| code-fold: show
#| message: false
#| warning: false
set.seed(2026)
N <- 200L
x <- rbinom(N, size = 1, prob = 0.5)
beta0_true <- log(0.05) # log baseline hazard
beta1_true <- -0.7 # log hazard ratio for x = 1
event_time <- rexp(N, rate = exp(beta0_true + beta1_true * x))
cens_time <- rexp(N, rate = 0.03)
y <- pmin(event_time, cens_time)
event <- as.integer(event_time <= cens_time) # 1 = event, 0 = censored
surv_model <- "model {
C <- 10000
for (i in 1:N) {
log(lambda[i]) <- beta0 + beta1 * x[i]
# exponential log-likelihood, right-censored:
loglik[i] <- event[i] * log(lambda[i]) - lambda[i] * y[i]
phi[i] <- C - loglik[i]
zeros[i] ~ dpois(phi[i])
}
beta0 ~ dnorm(0, 0.01)
beta1 ~ dnorm(0, 0.01)
HR <- exp(beta1) # hazard ratio for x = 1 vs x = 0
}"
jags_surv <- run.jags(
model = surv_model,
data = list(y = y, x = x, event = event, N = N, zeros = rep(0, N)),
monitor = c("beta0", "beta1", "HR"),
n.chains = 2, inits = initsfunction,
burnin = 1000, sample = 4000, method = "rjags"
)
summary(jags_surv)[, c("Mean", "Lower95", "Upper95", "psrf")] |> round(3)
```
The posterior for $\beta_1$ concentrates near the true $-0.7$, and the
monitored $\exp(\beta_1)$ gives the hazard ratio with a credible interval
that accounts for the censoring without any large-sample approximation.
:::
# Random effects {#sec-bayes-examples-random-effects}
Hierarchical models (introduced in
[the Bayesian-analysis appendix](intro-bayes.qmd#sec-bayes-hierarchies))
are the Bayesian counterpart of the random-effects and multilevel models
of [@dobson4e, Chapter 11].
Group-level parameters $\theta_j \sim \ndist{\mu, \tau^2}$ are drawn from
a common distribution whose hyperparameters $\mu$ and $\tau$ carry their
own priors.
The posterior **shrinks** each group's estimate toward the overall mean
by an amount the data determine through $\tau$, and — unlike the
frequentist fit — the posterior for $\tau$ propagates the uncertainty in
the between-group spread into every group-level summary
[@dobson4e, Chapter 14, p. 333].
:::{#exm-bayes-random-effects}
#### Bayesian random-intercept model
We simulate $J = 8$ groups of $12$ observations each, with group means
drawn from $\ndist{\mu, \tau^2}$ ($\mu = 5$, $\tau = 1.5$) and
within-group noise $\sigma = 2$, then recover the hyperparameters.
```{r}
#| code-fold: show
#| message: false
#| warning: false
set.seed(2025)
J <- 8L
n_j <- 12L
mu_true <- 5
tau_true <- 1.5
sigma_true <- 2
theta_true <- rnorm(J, mean = mu_true, sd = tau_true)
group <- rep(seq_len(J), each = n_j)
y <- rnorm(J * n_j, mean = theta_true[group], sd = sigma_true)
re_model <- "model {
for (i in 1:N) {
y[i] ~ dnorm(theta[group[i]], inv_sigma2)
}
for (j in 1:J) {
theta[j] ~ dnorm(mu, inv_tau2) # random intercepts
}
mu ~ dnorm(0, 1.0E-4)
inv_sigma2 ~ dgamma(0.01, 0.01)
inv_tau2 ~ dgamma(0.01, 0.01)
sigma <- 1 / sqrt(inv_sigma2) # within-group SD
tau <- 1 / sqrt(inv_tau2) # between-group SD
}"
jags_re <- run.jags(
model = re_model,
data = list(y = y, group = group, N = length(y), J = J),
monitor = c("mu", "tau", "sigma"),
n.chains = 2, inits = initsfunction,
burnin = 1000, sample = 4000, method = "rjags"
)
summary(jags_re)[, c("Mean", "Lower95", "Upper95", "psrf")] |> round(3)
```
The posterior recovers the overall mean $\mu \approx 5$, the
between-group SD $\tau \approx 1.5$, and the within-group SD
$\sigma \approx 2$. Crucially, the credible interval for $\tau$ reports
genuine uncertainty about the between-group spread — uncertainty a
frequentist point estimate of the variance component discards.
:::
# Bayesian model averaging {#sec-bayes-examples-bma}
When several candidate models are plausible, **Bayesian model averaging
(BMA)** forms predictions by averaging over models, weighting each by its
posterior probability rather than committing to a single "best" model
[@dobson4e, Chapter 14, p. 338].
This propagates *model* uncertainty — not just parameter uncertainty —
into the final inference.
Unlike the single-model fits above, BMA requires fitting and comparing a
*collection* of models and computing their posterior weights, which
connects directly to the model-comparison criteria of
[the MCMC appendix](mcmc-methods.qmd#sec-mcmc-dic) and to our
[predictor-selection chapter](predictor-selection.qmd).
:::{#exm-bayes-bma}
#### A BIC approximation to Bayesian model averaging
Marginal likelihoods are hard to compute, but the Bayesian information
criterion provides a convenient approximation to the posterior model
probability [@schwarz1978estimating]: for model $m$,
$p(m \mid \vy) \propto \expf{-\tfrac{1}{2}\operatorname{BIC}_m}$.
We simulate data in which only $x_1$ and $x_2$ affect the outcome
(with $x_3$ irrelevant), enumerate every subset of the three predictors,
and form BIC-based model weights.
```{r}
#| code-fold: show
set.seed(2027)
N <- 120L
x1 <- rnorm(N)
x2 <- rnorm(N)
x3 <- rnorm(N) # irrelevant
y <- 1 + 0.8 * x1 + 0.5 * x2 + rnorm(N)
dat <- data.frame(y, x1, x2, x3)
predictors <- c("x1", "x2", "x3")
subsets <- unlist(
lapply(0:3, function(k) combn(predictors, k, simplify = FALSE)),
recursive = FALSE
)
models <- lapply(subsets, function(vars) {
rhs <- if (length(vars)) paste(vars, collapse = " + ") else "1"
lm(as.formula(paste("y ~", rhs)), data = dat)
})
# BIC approximation to posterior model probabilities:
bic <- vapply(models, BIC, numeric(1))
weight <- exp(-0.5 * (bic - min(bic)))
weight <- weight / sum(weight)
```
The **posterior inclusion probability** of each predictor is the total
weight of the models that contain it:
```{r}
#| code-fold: show
pip <- vapply(
predictors,
function(v) sum(weight[vapply(subsets, function(s) v %in% s, logical(1))]),
numeric(1)
)
round(pip, 3)
```
and the **model-averaged** slope for $x_1$ averages each model's estimate
(taken as $0$ when $x_1$ is absent from that model):
```{r}
#| code-fold: show
beta_x1 <- vapply(models, function(m) {
cf <- coef(m)
if ("x1" %in% names(cf)) cf[["x1"]] else 0
}, numeric(1))
round(c(bma_slope_x1 = sum(weight * beta_x1)), 3)
```
The real predictors $x_1$ and $x_2$ receive inclusion probabilities near
$1$, the irrelevant $x_3$ a small one, and the model-averaged slope for
$x_1$ sits near its true value $0.8$ — with the averaging across models,
rather than a single selected model, accounting for which predictors
belong.
:::
# References {.unnumbered}
::: {#refs}
:::