Unordered Logit

The Inverse-Logit Function

Suppose \(y_i \in \{0, 1\}\). Using logistic regression, we model \(\Pr(y_i = 1)\) as a function of covariates, so that

\[ \Pr(Y_i = 1) = \frac{\exp(X_i \beta)}{1 + \exp(X_i \beta)}. \]

The function \(f(z) = \frac{e^z}{1 + e^z}\) is called the inverse-logit function and maps any real number \(X_i \beta\) into a probability between 0 and 1.

Because the two probabilities must sum to one:

\[ \Pr(Y_i = 0) = 1 - \Pr(Y_i = 1) = \frac{1}{1 + \exp(X_i \beta)}. \]

The Softmax Function

The softmax generalizes the inverse-logit–it takes \(J\) real input (i.e., from \(\mathbb{R}^J\) and scales them to sum to one.

\[ \text{softmax}(z_j) = \frac{\exp(z_j)}{\sum_{k=1}^{J} \exp(z_k)}, \qquad j = 1,\ldots,J. \] Because the \(J\) outputs must sum to one, it is natural to interpret the outputs as probabilities.

The OJS widget below allows you to experiment with \(J = 4\) inputs and see how the outputted probabilities change.

Adding covariates

Now suppose the outcome takes \(J\) possible values:

\[ y_i \in \{1, 2, \ldots, J\}. \]

Example 1: We might label vote choice in the US as (1) abstain, (2) Republican, (3) Democrat, or (4) other.

Example 2: We might label coup attempts in a given country-year as (0) none attempted, (1) failed attempt, or (2) successful attempt.

For each category \(j\), define a linear predictor \(\eta_{ij} = X_i \beta_j\). Here, each \(\beta_j\) is a vector of coefficients.

We can use the softmax function to convert these linear predictors into probabilities that sum to one.

\[ \Pr(Y_i = j) = \frac{\exp(\eta_{ij})}{\sum_{k=1}^{J} \exp(\eta_{ik})}, \qquad j = 1, \ldots, J. \]

This is a generalization of the inverse-logit to \(J\) categories. If \(J = 2\), the softmax reduces to the inverse-logit.

However, this model is not identified because adding any constant to each of the \(\eta_{ij}\) produces the same probabilities.

Example Inputs

Input \(z_j\) Softmax \(p_j\)
\(-1\) 0.0303
\(1\) 0.2242
\(2\) 0.6095
\(0.5\) 0.1360

Adding a Constant (\(+2\))

New Input \(z_j + 2\) Softmax \(p_j\)
\(-1 + 2 = 1\) 0.0303
\(1 + 2 = 3\) 0.2242
\(2 + 2 = 4\) 0.6095
\(0.5 + 2 = 2.5\) 0.1360

Adding a constant to all inputs leaves the softmax unchanged. This is why the multinomial logit model requires an identification constraint.

To identify the model, we set \(\eta_{iJ} = 1\). Then for \(j = 1, \ldots, J-1\), we have

\[ \Pr(Y_i = j) = \frac{\exp(X_i \beta_j)} {1 + \sum_{k=1}^{J-1} \exp(X_i \beta_k)}. \]

And for the “baseline” category \(J\), we have

\[ \Pr(Y_i = J) = \frac{1} {1 + \sum_{k=1}^{J-1} \exp(X_i \beta_k)}. \]

# load data
anes <- read_csv("https://pos5747.github.io/data/red-state-pid.csv") |>
  mutate(pid = factor(pid, ordered = TRUE)) |>
  mutate(pid3 = case_when(
    pid %in% c("1. Strong Democrat",
               "2. Not very strong Democrat") ~ "D",
    pid %in% c("6. Not very strong Republican",
               "7. Strong Republican") ~ "R",
    pid %in% c("3. Independent (leans Democrat)",
               "4. Independent",
               "5. Independent (leans Republican)") ~ "I"
  )) |>
  glimpse()
Rows: 4,596
Columns: 6
$ state_name             <chr> "Oklahoma", "Idaho", "Virginia", "California", …
$ state_median_income    <dbl> 66.1, 81.2, 92.1, 100.1, 97.1, 79.7, 77.5, 100.…
$ rs_state_median_income <dbl> -0.595342224, -0.010181986, 0.412218451, 0.7222…
$ income                 <dbl> 1.5, 1.5, 1.5, -0.5, -0.5, -0.5, -0.5, -2.5, 0.…
$ pid                    <ord> 7. Strong Republican, 4. Independent, 1. Strong…
$ pid3                   <chr> "R", "I", "D", "R", "I", "I", "I", "I", "D", "D…
library(nnet)
fit <- multinom(pid3 ~ income, data = anes)
# weights:  9 (4 variable)
initial  value 5049.222079 
final  value 5029.924776 
converged
# summarize fit
summary(fit)
Call:
multinom(formula = pid3 ~ income, data = anes)

Coefficients:
  (Intercept)       income
I  -0.1676635 -0.058812795
R  -0.1575483 -0.005553168

Std. Errors:
  (Intercept)     income
I  0.03689560 0.02554012
R  0.03696938 0.02558134

Residual Deviance: 10059.85 
AIC: 10067.85 
# use {marginaleffects}
library(marginaleffects)
p <- predictions(fit, 
            newdata = datagrid(income = unique)) |>
  glimpse()
Rows: 18
Columns: 12
$ rowid     <int> 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6
$ group     <chr> "D", "D", "D", "D", "D", "D", "I", "I", "I", "I", "I", "I", …
$ estimate  <dbl> 0.3514003, 0.3590651, 0.3666381, 0.3741114, 0.3814781, 0.388…
$ std.error <dbl> 0.015903413, 0.011740574, 0.008352337, 0.007175789, 0.009256…
$ statistic <dbl> 22.09590, 30.58327, 43.89646, 52.13523, 41.21338, 29.40193, …
$ p.value   <dbl> 3.460671e-108, 2.043310e-205, 0.000000e+00, 0.000000e+00, 0.…
$ s.value   <dbl> 356.9772, 679.9644, Inf, Inf, Inf, 628.7910, 334.8351, 601.5…
$ conf.low  <dbl> 0.3202302, 0.3360540, 0.3502678, 0.3600472, 0.3633363, 0.362…
$ conf.high <dbl> 0.3825704, 0.3820762, 0.3830083, 0.3881757, 0.3996199, 0.414…
$ pid3      <chr> "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", …
$ income    <dbl> -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, -2.5, -1.5, -0.5, 0.5, 1.5,…
$ df        <dbl> Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, …
ggplot(p, aes(x = income, y = estimate, color = group)) + 
  geom_point() + 
  geom_line()
# load packages
library(brms)

# fit same model, but coefs vary across years
fit <- brm(pid3 ~ income*rs_state_median_income + (1 + income | state_name), 
           data = anes, 
           family = categorical(),
           cores = 4, 
           chains = 4, 
           warmup = 2000, 
           iter = 4000,
           backend = "rstan")
Code
# make grid 
grid <- anes %>%
  dplyr::select(state_name, state_median_income, rs_state_median_income) %>%
  distinct() %>%
  mutate(income = NA)

# make predictions
p <- predictions(fit, 
                 variables = list(income = unique),
                 newdata = grid) |>
  mutate(state_name = reorder(state_name, rs_state_median_income))

# make plot
gg <- ggplot(p, aes(x = income, y = estimate, color = group)) +
  geom_line() +
  facet_wrap(vars(state_name))
print(gg)

An observation worth remembering

glimpse(anes)
Rows: 4,596
Columns: 6
$ state_name             <chr> "Oklahoma", "Idaho", "Virginia", "California", …
$ state_median_income    <dbl> 66.1, 81.2, 92.1, 100.1, 97.1, 79.7, 77.5, 100.…
$ rs_state_median_income <dbl> -0.595342224, -0.010181986, 0.412218451, 0.7222…
$ income                 <dbl> 1.5, 1.5, 1.5, -0.5, -0.5, -0.5, -0.5, -2.5, 0.…
$ pid                    <ord> 7. Strong Republican, 4. Independent, 1. Strong…
$ pid3                   <chr> "R", "I", "D", "R", "I", "I", "I", "I", "D", "D…
f <- pid ~ income
fit1 <- MASS::polr(f, data = anes)
fit2 <- multinom(f, data = anes)
# weights:  21 (12 variable)
initial  value 8943.403045 
iter  10 value 8445.069398
final  value 8432.049880 
converged
BIC(fit1, fit2)
     df      BIC
fit1  7 17011.54
fit2 12 16965.30