5  Parametric Bootstrap

The parametric bootstrap is a powerful, general tool to obtain confidence intervals for estimates from parametric models.

Parametric bootstrap versus alternatives

The parametric bootstrap is used less often than the nonparametric bootstrap, because it relies heavily on the assumed parametric model. I focus on the parametric bootstrap here, because it builds excellent intuition for “what we’re up to” when we are estimating standard errors.

Algorithm: Parametric Bootstrap Estimator

  1. Input: Observed data \(x_1, x_2, \dots, x_n\), a parametric model \(f(\theta)\), an estimator \(\hat{\theta}\), and a quantity of interest \(\hat{\tau} = \tau(\hat{\theta})\).1
  2. Compute \(\hat{\theta}\) from the observed data.
  3. For \(b = 1, 2, \dots, B\):2
    1. Generate a bootstrap sample \(x_1^{*(b)}, x_2^{*(b)}, \dots, x_n^{*(b)} \sim f(\hat{\theta})\). That is, simulate a new data set from the fitted parametric model.3
    2. Fit the model to the simulated data set \(x_1^{*(b)}, x_2^{*(b)}, \dots, x_n^{*(b)}\) and compute \(\hat{\theta}^{*(b)}\).
    3. Compute \(\hat{\tau}^{*(b)} = \tau \left( \hat{\theta}^{*(b)}\right)\), which might be \(\theta^{*(b)}\) itself if no transformation is needed.
  4. Summarize the empirical distribution of \(\hat{\tau}^{*(1)}, \hat{\tau}^{*(2)}, \dots, \hat{\tau}^{*(B)}\) using the SD to estimate the SE or 5th/95th percentiles to obtain a 90% confidence interval.

1 This works for any statistic that is a function of \(x_1, \dots, x_n\), but we’ll mainly use ML estimators and direct transformations of those.

2 We typically use \(B = 2,000\) as a good starting point.

3 For example, if we use an exponential model and estimate the rate as \(\hat{\lambda}\), then we would simulate each data set from an exponential distribution with parameter \(\hat{\lambda}\). The first data set is \(x_1^{*(1)}, \dots, x_n^{*(1)}\), the second data set is \(x_1^{*(2)}, \dots, x_n^{*(2)}\), and so on. In R, that would be rexp(N, rate = lambda_hat)

Importantly, we lean heavily on the assumption that we have a good model of the distribution of the data. (The predictive distribution allows us to assess this.) There’s also a nonparametric bootstrap, which is much more popular. We consider that later in the semester.

5.1 Example: Toothpaste Cap Problem

The code below implements the parametric bootstrap for the toothpaste cap problem. For 2,000 iterations, it draws 150 observations from a Bernoulli distribution with \(\hat{\pi} = \frac{8}{150}\). For each iteration, it computes the ML estimate of \(\pi\) for the bootstrapped data set. Then it uses the SD of the bootstrap estimates to estimate the SE and computes the percentiles of the bootstrap estimates to obtain the confidence interval.

n_bs <- 2000
bs_est <- numeric(n_bs)  # a container for the estimates
for (i in 1:n_bs) {
  bs_y <- rbinom(150, size = 1, prob = 8/150)
  bs_est[i] <- mean(bs_y)
}

print(sd(bs_est), digits = 2)  # se estimate
[1] 0.018
print(quantile(bs_est, probs = c(0.05, 0.95)), digits = 2)  # 90% ci
   5%   95% 
0.027 0.087 

We leave an evaluation of this confidence interval (i.e., Does it capture \(\theta\) 90% of the time?) to later.

5.2 Example: Beta Distribution

Now let’s apply the parametric bootstrap to a two-parameter model: the beta distribution.

First, let’s simulate a (fake) data set to use.

# set parameters
alpha <- 5
beta <- 2

# simulate data
set.seed(1234)
n <- 100
y <- rbeta(n, alpha, beta)

Now let’s find the ML estimates of the two shape parameters.

# obtain ml estimates
log_lik_fn <- function(par = c(2, 2), y) {
  a <- par[1]  # pulling these out makes the code a bit easier to follow
  b <- par[2]
  log_lik_i <- dbeta(y, shape1 = a, shape2 = b, log = TRUE)
  log_lik <- sum(log_lik_i)
  return(log_lik)
}
opt <- optim(par = c(3, 3), fn = log_lik_fn, y = y,
             control = list(fnscale = -1))
ml_est <- opt$par
print(ml_est, digits = 3)
[1] 5.46 1.91

Now let’s use those ML estimates to perform a parametric bootstrap and find 95% CIs for the shape parameters.

# obtain parametric bootstrap 95% ci for alpha and beta
n_bs <- 2000
bs_est <- matrix(NA, nrow = n_bs, ncol = 2)  # a container for the estimates
for (i in 1:n_bs) {
  bs_y <- rbeta(n, shape1 = ml_est[1], shape2 = ml_est[2])
  bs_opt <- optim(par = c(3, 3), fn = log_lik_fn, y = bs_y,
             control = list(fnscale = -1))
  bs_est[i, ] <- bs_opt$par
}
ci <- apply(bs_est, MARGIN = 2, quantile, probs = c(0.025, 0.975))
print(ci, digits = 3)  # 95% ci
      [,1] [,2]
2.5%  4.25 1.52
97.5% 7.52 2.58

If instead we cared about the mean of the beta distribution (which is \(\frac{\alpha}{\alpha + \beta}\)), we can use the parametric bootstrap to obtain a confidence interval for that quantity as well. Since we drew the fake data set from a beta distribution with \(\alpha = 5\) and \(\beta = 2\), the true mean is \(\frac{\alpha}{\alpha + \beta} = \frac{5}{7} \approx 0.71\).

# obtain parametric bootstrap 95% ci for mean
n_bs <- 2000
bs_est <- numeric(n_bs)  # a container for the estimates
for (i in 1:n_bs) {
  bs_y <- rbeta(n, shape1 = ml_est[1], shape2 = ml_est[2])
  bs_opt <- optim(par = c(3, 3), fn = log_lik_fn, y = bs_y,
             control = list(fnscale = -1))
  bs_alpha <- bs_opt$par[1]
  bs_beta <- bs_opt$par[2]
  bs_est[i] <- bs_alpha/(bs_alpha + bs_beta)
}
print(quantile(bs_est, probs = c(0.05, 0.95)), digits = 2)  # 95% ci
  5%  95% 
0.72 0.77 
print(sd(bs_est), digits = 2)  # estimate of SE
[1] 0.015
# true mean 
print(alpha/(alpha + beta), digits = 2)
[1] 0.71