12  Big Four Models

We can extend the ideas from the last chapter on logistic regression to a larger class of probability models.

  1. Choose a distribution.
  2. Choose the parameters to model as functions of covariates and those to model as fixed. Typically, this is a parameter that largely determines the location of the distribution, but we can also model the dispersion or multiple parameters at once.
  3. Choose an inverse link function that maps the unbounded linear predictor \(X_i \beta \in \mathbb R\) onto the parameter space of the modeled parameters.
  4. Fit the model using maximum likelihood. Use Fisher information to estimate the variance. Use invariance property and delta method to obtain quantities of interest and estimate their variance.

12.1 Summary of the big four models

12.1.1 Normal Linear

Component Expression
Outcome Continuous
Model \(y_i \sim \mathcal{N}(\mu_i, \sigma^2)\), where \(\mu_i = X_i \beta\)
Expected value \(\hat{\mu} = X_c \hat{\beta}\) for chosen row \(X_c\)
First difference \(\hat{\Delta} = \hat{\mu}_{hi} - \hat{\mu}_{lo} = X_{hi} \hat{\beta} - X_{lo} \hat{\beta}\) for chosen rows \(X_{hi}\) and \(X_{lo}\)
Function lm() or glm(..., family = guassian)
Parameterization notes Parameterized as expected
Alternatives | \(t\) regression |

12.1.2 Logit

Component Expression
Outcome Binary
Model \(y_i \sim \text{Bernoulli}(\pi_i)\), where \(\pi_i = \text{logit}^{-1}(X_i \beta)\)
Expected value \(\hat{\mu} = \hat{\pi} = \text{logit}^{-1}(X_c \hat{\beta})\)
First difference \(\hat{\Delta} = \hat{\pi}_{hi} - \hat{\pi}_{lo} = \text{logit}^{-1}(X_{hi} \hat{\beta}) - \text{logit}^{-1}(X_{lo} \hat{\beta})\)
Function glm(..., family = binomial)
Parameterization notes Parameterized as expected
Alternatives Probit

12.1.3 Negative Binomial

Component Expression
Outcome Count (non-negative integers)
Model \(y_i \sim \text{NegBin}(\mu_i, \theta)\), with \(\log \mu_i = X_i \beta\) and \(\operatorname{Var}(y_i) = \mu_i + \mu_i^2/\theta\)
Expected value \(\hat{\mu} = \exp(X_c \hat{\beta})\)
First difference \(\hat{\Delta} = \hat{\mu}_{hi} - \hat{\mu}_{lo} = \exp(X_{hi} \hat{\beta}) - \exp(X_{lo} \hat{\beta})\)
Function MASS::glm.nb() for fitting models; dnbinom() for densities
Parameterization notes Uses the mean–dispersion form: regression is parameterized in terms of the mean \(\mu_i\) with log link. The dispersion parameter \(\theta\) (size in R) controls overdispersion. In dnbinom() you can use either (size, prob) or (size, mu); glm.nb() uses the mean–dispersion form \((\mu_i, \theta)\).
Alternatives Poisson regression (but variance equals mean), zero-inflated variants

12.1.4 Weibull

It’s harder to choose a default model for duration data. We’ve got lots of common options and folks tend to use a semiparametric approach for these data. But to pick a decent default (for both data analysis and an example to learn from), the Weibull model is a good choice.

Component Expression
Outcome Survival or duration time (positive continuous)
Model \(y_i \sim \text{Weibull}(\lambda_i, k)\), with scale parameter \(\lambda_i = \exp(X_i \beta)\) and shape parameter \(k\) (constant across units)
Expected value \(\hat{\mu} = \hat{\lambda} \, \Gamma(1 + 1/k)\) for chosen row \(X_c\)
(the Gamma function \(\Gamma(\cdot)\) adjusts the mean away from the scale \(\lambda\) depending on shape \(k\))
First difference \(\hat{\Delta} = \hat{\mu}_{hi} - \hat{\mu}_{lo} = \hat{\lambda}_{hi}\,\Gamma(1+1/k) - \hat{\lambda}_{lo}\,\Gamma(1+1/k)\)
Function survreg(..., dist = "weibull") in survival package for fitting models; dweibull() for densities; note differences in parameterization, though.
Parameterization notes The base R density function dweibull(x, shape, scale) uses the standard shape–scale parameterization. However, survreg() uses parameterization in terms of \(\lambda\) and \(\sigma\), where shape \(= \frac{1}{\sigma}\) and scale = \(\exp(X_i \beta)\) The two forms are equivalent after reparameterization. To clarify, the log-likelihood used by survreg() is below.
Alternatives Exponential (special case of Weibull with \(k = 1\)); log-normal; gamma, generalized gamma, gompertz, semiparametric Cox proportional hazards
weibull_ll <- function(theta, y, X) {
  # extract parameters for ease of reading
  beta <- theta[1:ncol(X)]  # coefs are first ncol(X) values of theta
  sigma <- theta[ncol(X) + 1]  # sigma is the last value, after coefs
  
  # linear predictor
  eta <- X %*% beta
  # transform parameters
  k <- 1 / sigma               # Weibull shape
  lambda <- exp(eta)           # Weibull scal

  ll <- sum(dweibull(y, shape = k, scale = lambda, log = TRUE))
  return(ll)
}