Appendix I — Multinomial Logit

I.1 The Inverse-Logit Function

Suppose \(y_i \in \{0, 1\}\). Using logistic regression, we model \(\Pr(y_i = 1)\) as a function of covariates, so that

\[ \Pr(Y_i = 1) = \frac{\exp(X_i \beta)}{1 + \exp(X_i \beta)}. \]

The function \(f(z) = \frac{e^z}{1 + e^z}\) is called the inverse-logit function and maps any real number \(X_i \beta\) into a probability between 0 and 1.

Because the two probabilities must sum to one:

\[ \Pr(Y_i = 0) = 1 - \Pr(Y_i = 1) = \frac{1}{1 + \exp(X_i \beta)}. \]

I.2 The Softmax Function

The softmax generalizes the inverse-logit–it takes \(J\) real input (i.e., from \(\mathbb{R}^J\) and scales them to sum to one.

\[ \text{softmax}(z_j) = \frac{\exp(z_j)}{\sum_{k=1}^{J} \exp(z_k)}, \qquad j = 1,\ldots,J. \] Because the \(J\) outputs must sum to one, it is natural to interpret the outputs as probabilities.

The OJS widget below allows you to experiment with \(J = 4\) inputs and see how the outputted probabilities change.

I.3 Adding covariates

Now suppose the outcome takes \(J\) possible values:

\[ y_i \in \{1, 2, \ldots, J\}. \]

Example 1: We might label vote choice in the US as (1) abstain, (2) Repbulican, (3) Democrat, or (4) other.

Example 2: We might label coup attempts in a given country-year as (0) none attempted, (1) failed attempt, or (2) successful attempt.

For each category \(j\), define a linear predictor \(\eta_{ij} = X_i \beta_j\). Here, each \(\beta_j\) is a vector of coefficients.

We can use the softmax function to convert these linear predictors into probabilities that sum to one.

\[ \Pr(Y_i = j) = \frac{\exp(\eta_{ij})}{\sum_{k=1}^{J} \exp(\eta_{ik})}, \qquad j = 1, \ldots, J. \]

This is a generalization of the inverse-logit to \(J\) categories. If \(J = 2\), the softmax reduces to the inverse-logit.

However, this model is not identified because adding any constant to each of the \(\eta_{ij}\) produces the same probabilities.

Example Inputs

Input \(z_j\) Softmax \(p_j\)
\(-1\) 0.0303
\(1\) 0.2242
\(2\) 0.6095
\(0.5\) 0.1360

Adding a Constant (\(+2\))

New Input \(z_j + 2\) Softmax \(p_j\)
\(-1 + 2 = 1\) 0.0303
\(1 + 2 = 3\) 0.2242
\(2 + 2 = 4\) 0.6095
\(0.5 + 2 = 2.5\) 0.1360

Adding a constant to all inputs leaves the softmax unchanged. This is why the multinomial logit model requires an identification constraint.

To identify the model, we set \(\eta_{iJ} = 1\). Then for \(j = 1, \ldots, J-1\), we have

\[ \Pr(Y_i = j) = \frac{\exp(X_i \beta_j)} {1 + \sum_{k=1}^{J-1} \exp(X_i \beta_k)}. \]

And for the “baseline” category \(J\), we have

\[ \Pr(Y_i = J) = \frac{1} {1 + \sum_{k=1}^{J-1} \exp(X_i \beta_k)}. \]