MRP

Here are the packages we’re using:

# generally useful packages
library(tidyverse)
library(marginaleffects)
library(modelsummary)
library(ggrepel)

# for fitting hierarchical models
library(lme4)
library(brms)  # see also rstanarm::stan_lmer()
library(rstan)

# for post-processing simulations
library(tidybayes)

MRP

“Multilevel regression with poststratification” or “Mister P”
We have sufficiently large polls to estimate national opinion well, but how can we use these to estimate state-level opinion well across the fifty states?

Two-step process

Use a national survey data (e.g., N = 2,000) and a hierarchical model to estimate public opinion within small subgroups defined by state and demographic variables. Example: Men age 18-29 without military experience and no HS degree living in Alabama.
Look up the fraction of each state that falls into each demographic category. Weight the predictions for each state-demographic groups by these fractions to obtain an estimate of state-level opion.

Two-step process: Step 2(a)

Find the populations of demographic subgroups in each state.

This might come from the Census or CPS.

Here’s an example packaged with {marginaleffects}.

demographics <- get_dataset("ces_demographics") |>
  glimpse()

Rows: 5,929
Columns: 6
$ state     <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL"…
$ age       <fct> 18-29, 18-29, 18-29, 18-29, 18-29, 18-29, 18-29, 18-29, 18-29, 18-29, …
$ education <fct> No HS, No HS, No HS, No HS, High school, High school, High school, Hig…
$ gender    <chr> "Man", "Man", "Woman", "Woman", "Man", "Man", "Woman", "Woman", "Man",…
$ military  <dbl> 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1…
$ percent   <dbl> 0.318, 0.212, 0.530, 0.318, 1.803, 0.636, 4.454, 1.803, 0.954, 0.530, …

Two-step process: Step 1

Use a national survey data (e.g., N = 2,000) and a hierarchical model to estimate public opinion within each each subgroup.

survey <- get_dataset("ces_survey") |>
  glimpse()

Rows: 3,000
Columns: 6
$ state     <chr> "TX", "NJ", "CO", "PA", "PA", "IA", "CA", "NY", "CT", "NH", "WI", "UT"…
$ defund    <dbl> 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1…
$ gender    <chr> "Woman", "Woman", "Woman", "Man", "Woman", "Man", "Man", "Woman", "Wom…
$ age       <fct> 70+, 18-29, 60-69, 50-59, 70+, 40-49, 70+, 18-29, 18-29, 60-69, 60-69,…
$ education <fct> Some college, 4 year, 4 year, 4 year, High school, Some college, 4 yea…
$ military  <dbl> 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0…

Two-step process: Step 2(b)

Fit the model.

# fit model
f <- defund ~ age + education + military + gender + (1 + gender | state)
model <- brm(f, family = bernoulli, data = survey, chains = 4, cores = 4)

Two-step process: Step 2(c)

Create the predictions.

p <- predictions(model, newdata = demographics) |>
  glimpse()

Rows: 5,929
Columns: 11
$ rowid     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,…
$ estimate  <dbl> 0.435, 0.436, 0.475, 0.474, 0.459, 0.459, 0.498, 0.498, 0.540, 0.540, …
$ conf.low  <dbl> 0.320, 0.316, 0.346, 0.342, 0.381, 0.375, 0.407, 0.403, 0.459, 0.457, …
$ conf.high <dbl> 0.553, 0.557, 0.599, 0.600, 0.535, 0.542, 0.586, 0.588, 0.615, 0.620, …
$ state     <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL"…
$ age       <fct> 18-29, 18-29, 18-29, 18-29, 18-29, 18-29, 18-29, 18-29, 18-29, 18-29, …
$ education <fct> No HS, No HS, No HS, No HS, High school, High school, High school, Hig…
$ gender    <chr> "Man", "Man", "Woman", "Woman", "Man", "Man", "Woman", "Woman", "Man",…
$ military  <dbl> 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1…
$ percent   <dbl> 0.318, 0.212, 0.530, 0.318, 1.803, 0.636, 4.454, 1.803, 0.954, 0.530, …
$ df        <dbl> Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, …

Two-step process: Step 2(d)

Weight the predictions.

ests <- p |>
  mutate(w_ests = estimate * percent) |>
  group_by(state) |>
  summarize(mrp_est = sum(w_ests)) |>
  ungroup() |>
  glimpse()

Rows: 48
Columns: 2
$ state   <chr> "AL", "AR", "AZ", "CA", "CO", "CT", "DE", "FL", "GA", "IA", "ID", "IL", …
$ mrp_est <dbl> 40.9, 39.9, 38.0, 45.1, 43.7, 40.1, 38.4, 39.2, 39.3, 38.8, 40.1, 40.7, …

Plot

gg_ests <- ests |>
  mutate(state = reorder(state, mrp_est))
ggplot(gg_ests, aes(x = mrp_est, y = state)) + 
  geom_point()

Question

We used this model. What is it? What others might we have used?

defund ~ age + education + military + gender + (1 + gender | state)