Appendix H — Ordered Logit

H.1 Ordered Outcomes

Suppose an ordered outcome \(y_i \in \{1, 2, \ldots, J\}\), where \(\{1, 2, \ldots, J\}\) correspond to qualitative labels that have a natural ordering.

Example 1. The ANES asks respondents to place themselves along a 7-point ideology scale with positions labeled “extremely liberal,” “liberal,” “slightly liberal,” “moderate/middle of the road,” “slightly conservative,” “conservative,” and “extremely conservative.” We typically let 1, 2, 3, 4, 5, 6, and 7. correspond to these seven ordered labels.

Example 2. The Political Terror Scale (PTS) is a country-year measure of state repression. This measure assigns one of five qualitative labels summarizing the amount of state repression to each country-years based on reports from Amnesty International and the U.S. State Department.

Numeric Coding	Qualitative Label
1	Countries under a secure rule of law, people are not imprisoned for their views, and torture is rare or exceptional. Political murders are extremely rare.
2	There is a limited amount of imprisonment for nonviolent political activity. However, few persons are affected, torture and beatings are exceptional. Political murder is rare.
3	There is extensive political imprisonment, or a recent history of such imprisonment. Execution or other political murders and brutality may be common. Unlimited detention, with or without a trial, for political views is accepted.
4	Civil and political rights violations have expanded to large numbers of the population. Murders, disappearances, and torture are a common part of life. In spite of its generality, on this level terror affects those who interest themselves in politics or ideas.
5	Terror has expanded to the whole population. The leaders of these societies place no limits on the means or thoroughness with which they pursue personal or ideological goals.

H.2 Ordered Logit

Let linear predictor \(\eta_i\) be our usual linear predictor so that \(\eta_i = X_i^\beta\), except without an intercept \(\beta_0\).

Now define \(J-1\) “cutpoints” \(\alpha_1 < \alpha_2 < \cdots < \alpha_{J-1}\) that act as a separate intercept for each category.

The ordered logit defines the the cumulative probabilities as

\[ \Pr(Y_i \leq j) = \text{logit}^{-1}(\alpha_j - \eta_i), \qquad j = 1, \ldots, J-1. \]

Then probabilities for each category are

\[ \begin{align} \Pr(Y_i = 1) &= \text{logit}^{-1}(\alpha_1 - \eta_i) \\ \Pr(Y_i = 2) &= \text{logit}^{-1}(\alpha_2 - \eta_i) - \text{logit}^{-1}(\alpha_{1} - \eta_i) \\ \Pr(Y_i = 3) &= \text{logit}^{-1}(\alpha_3 - \eta_i) - \text{logit}^{-1}(\alpha_{3} - \eta_i) \\ &\vdots\\ \Pr(Y_i = j) &= \overbrace{\text{logit}^{-1}(\alpha_j - \eta_i)}^{\Pr(Y_i \leq j)} - \overbrace{\text{logit}^{-1}(\alpha_{j-1} - \eta_i)}^{^{\Pr(Y_i \leq j-1)}}, \qquad j = 2, \ldots, J-1 \\ &\vdots\\ \Pr(Y_i = J) &= 1 - \text{logit}^{-1}(\alpha_{J-1} - \eta_i). \end{align} \]

The figure below shows this graphically for an outcome with four possible values.

I set the three cutpoints to \(\alpha = (-2.5,-1.3,1.3)\).
I set the linear predictor for an example observation to \(\eta_i = -1.5\).

The figure shows how \(\alpha\) and \(\eta_i\) are translated into \(\Pr(y_i = j)\).

H.3 Example: Party ID by age and race in 2014

As an example, we can model party ID as a function of income.

H.3.1 `polr()`

First, load a data set I’ve created with variables party id (pid) and income.

# load data
anes <- read_csv("data/red-state-pid.csv") |>
  glimpse()

Rows: 4,596
Columns: 5
$ state_name             <chr> "Oklahoma", "Idaho", "Virginia", "California", …
$ state_median_income    <dbl> 66.1, 81.2, 92.1, 100.1, 97.1, 79.7, 77.5, 100.…
$ rs_state_median_income <dbl> -0.595342224, -0.010181986, 0.412218451, 0.7222…
$ income                 <dbl> 1.5, 1.5, 1.5, -0.5, -0.5, -0.5, -0.5, -2.5, 0.…
$ pid                    <chr> "7. Strong Republican", "4. Independent", "1. S…

Notice that the ordinal party ID lables have a numerial prefix as part of the character string. This is helpful to (1) quickly remind us of the order and (2) ensure that the alphabetic order and ordinal rankings of their values are the same.

Because the dataset is in .csv format, pid is loaded as a character string, but we can easily convert it to an ordered factor.

# load data
anes <- read_csv("https://pos5747.github.io/data/red-state-pid.csv") |>
  mutate(pid = factor(pid, ordered = TRUE)) |>
  glimpse()

Rows: 4,596
Columns: 5
$ state_name             <chr> "Oklahoma", "Idaho", "Virginia", "California", …
$ state_median_income    <dbl> 66.1, 81.2, 92.1, 100.1, 97.1, 79.7, 77.5, 100.…
$ rs_state_median_income <dbl> -0.595342224, -0.010181986, 0.412218451, 0.7222…
$ income                 <dbl> 1.5, 1.5, 1.5, -0.5, -0.5, -0.5, -0.5, -2.5, 0.…
$ pid                    <ord> 7. Strong Republican, 4. Independent, 1. Strong…

# examine levels of ordinal variable
levels(anes$pid)

[1] "1. Strong Democrat"                "2. Not very strong Democrat"      
[3] "3. Independent (leans Democrat)"   "4. Independent"                   
[5] "5. Independent (leans Republican)" "6. Not very strong Republican"    
[7] "7. Strong Republican"

Because of the numbers I included in the prefix, the default alphabetical ordering is exactly the ordering we want.

# fit ordered logit model
library(MASS)  # be wary of conflicts with filter() and select()
fit <- polr(pid ~ income, data = anes, Hess = TRUE)

# summarize fit
summary(fit)

Call:
polr(formula = pid ~ income, data = anes, Hess = TRUE)

Coefficients:
          Value Std. Error t value
income -0.02192    0.01843  -1.189

Intercepts:
                                                                Value   
1. Strong Democrat|2. Not very strong Democrat                   -1.0666
2. Not very strong Democrat|3. Independent (leans Democrat)      -0.5277
3. Independent (leans Democrat)|4. Independent                    0.0123
4. Independent|5. Independent (leans Republican)                  0.2257
5. Independent (leans Republican)|6. Not very strong Republican   0.7555
6. Not very strong Republican|7. Strong Republican                1.2820
                                                                Std. Error
1. Strong Democrat|2. Not very strong Democrat                    0.0344  
2. Not very strong Democrat|3. Independent (leans Democrat)       0.0313  
3. Independent (leans Democrat)|4. Independent                    0.0302  
4. Independent|5. Independent (leans Republican)                  0.0304  
5. Independent (leans Republican)|6. Not very strong Republican   0.0324  
6. Not very strong Republican|7. Strong Republican                0.0365  
                                                                t value 
1. Strong Democrat|2. Not very strong Democrat                  -30.9733
2. Not very strong Democrat|3. Independent (leans Democrat)     -16.8853
3. Independent (leans Democrat)|4. Independent                    0.4074
4. Independent|5. Independent (leans Republican)                  7.4259
5. Independent (leans Republican)|6. Not very strong Republican  23.3255
6. Not very strong Republican|7. Strong Republican               35.1414

Residual Deviance: 16952.51 
AIC: 16966.51

H.3.2 Probabilities with {marginaleffects}

To interpret the coefficients, we can compute the probability of falling into each category as age varies from 18 to 89 years.

# use {marginaleffects}
library(marginaleffects)
p <- predictions(fit, 
            newdata = datagrid(income = unique)) |>
  glimpse()

Rows: 42
Columns: 12
$ rowid     <dbl> 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, …
$ group     <fct> 1. Strong Democrat, 1. Strong Democrat, 1. Strong Democrat, …
$ estimate  <dbl> 0.24575788, 0.24984362, 0.25397441, 0.25814999, 0.26237009, …
$ std.error <dbl> 0.011645748, 0.009052791, 0.007085123, 0.006474830, 0.007661…
$ statistic <dbl> 21.10280, 27.59852, 35.84615, 39.86977, 34.24347, 26.38353, …
$ p.value   <dbl> 7.496604e-99, 1.159208e-167, 2.111349e-281, 0.000000e+00, 5.…
$ s.value   <dbl> 325.9646, 554.5489, 932.3836, Inf, 851.2881, 507.1728, 346.8…
$ conf.low  <dbl> 0.22293263, 0.23210047, 0.24008782, 0.24545956, 0.24735304, …
$ conf.high <dbl> 0.26858312, 0.26758676, 0.26786100, 0.27084043, 0.27738714, …
$ pid       <ord> 1. Strong Democrat, 1. Strong Democrat, 1. Strong Democrat, …
$ income    <dbl> -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, -2.5, -1.5, -0.5, 0.5, 1.5,…
$ df        <dbl> Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, …

ggplot(p, aes(x = income, y = estimate, color = group)) + 
  geom_point() + 
  geom_line()

H.3.3 Computing cumulative probabilities

However, the probabilities above are somewhat awkward to interpret. As the middle categories shrink, it isn’t immediately clear what categories are growing as a result. Instead, we can compute an easier-to-interpret cumulative probability using cumsum() within arrange()ed groups.

cumulative_p <- p %>%
  arrange(income, group) %>%
  group_by(income) %>%
  mutate(cumulative_prob = cumsum(estimate),
         j = as.integer(group),
         cdf_label = paste0("Pr(Y ≤ ", group, ")"),
         cdf_label = reorder(cdf_label, j)) %>%
  ungroup()
ggplot(cumulative_p, aes(x = income, y = cumulative_prob, color = cdf_label)) + 
  geom_line()

Alternatively, w can use position = "stack" with geom_line() or geom_area() to trick ggplot() into plotting the cumulative probabilities for us if we don’t want to compute them manually first.

ggplot(p, aes(x = income, y = estimate, color = group)) +
  geom_line(position = "stack")

ggplot(p, aes(x = income, y = estimate, fill = group)) +
  geom_area(position = "stack")

H.3.4 Example: The varying relationship across years

We can use a hierarchical variant to model the differences across states. Consistent with Gelman et al.’s “red state, blue state” argument, the changes across income might vary across rich and poor states.

With that in mind, we allow varying intercepts and slopes across states and model those using state median income.

# load packages
library(brms)

# fit same model, but coefs vary across years
fit <- brm(pid ~ income*rs_state_median_income + (1 + income | state_name), 
           data = anes, 
           family = cumulative(),
           cores = 4, 
           chains = 4, 
           backend = "rstan")

Now we can plot the cumulative probabilities for each of the states and see that the effect of income does indeed vary across the states.

# make grid 
grid <- anes %>%
  dplyr::select(state_name, state_median_income, rs_state_median_income) %>%
  distinct() %>%
  mutate(income = NA) |>
  glimpse()

Rows: 51
Columns: 4
$ state_name             <chr> "Oklahoma", "Idaho", "Virginia", "California", …
$ state_median_income    <dbl> 66.1, 81.2, 92.1, 100.1, 97.1, 79.7, 77.5, 102.…
$ rs_state_median_income <dbl> -0.595342224, -0.010181986, 0.412218451, 0.7222…
$ income                 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

# make predictions
p <- predictions(fit, 
                 variables = list(income = unique),
                 newdata = grid) |>
  mutate(state_name = reorder(state_name, rs_state_median_income)) |>
  glimpse()

Rows: 2,142
Columns: 10
$ rowid                  <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, …
$ rowidcf                <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, …
$ group                  <fct> 1. Strong Democrat, 1. Strong Democrat, 1. Stro…
$ estimate               <dbl> 0.2455175, 0.2513170, 0.2454534, 0.2646395, 0.2…
$ conf.low               <dbl> 0.1562062, 0.1742986, 0.1742482, 0.2087313, 0.1…
$ conf.high              <dbl> 0.3409455, 0.3523408, 0.3200198, 0.3278561, 0.3…
$ rs_state_median_income <dbl> -0.595342224, -0.010181986, 0.412218451, 0.7222…
$ state_name             <fct> Oklahoma, Idaho, Virginia, California, Colorado…
$ income                 <dbl> -2.5, -2.5, -2.5, -2.5, -2.5, -2.5, -2.5, -2.5,…
$ df                     <dbl> Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, In…

# make plot
ggplot(p, aes(x = income, y = estimate, fill = group)) +
  geom_area(position = "stack") +
  facet_wrap(vars(state_name), ncol = 4)  +
  scale_fill_brewer(palette = "RdBu", direction = -1)