Appendix I — Multinomial Logit

I.1 The Inverse-Logit Function

Suppose \(y_i \in \{0, 1\}\). Using logistic regression, we model \(\Pr(y_i = 1)\) as a function of covariates, so that

\[ \Pr(Y_i = 1) = \frac{\exp(X_i \beta)}{1 + \exp(X_i \beta)}. \]

The function \(f(z) = \frac{e^z}{1 + e^z}\) is called the inverse-logit function and maps any real number \(X_i \beta\) into a probability between 0 and 1.

Because the two probabilities must sum to one:

\[ \Pr(Y_i = 0) = 1 - \Pr(Y_i = 1) = \frac{1}{1 + \exp(X_i \beta)}. \]

I.2 The Softmax Function

The softmax generalizes the inverse-logit–it takes \(J\) real input (i.e., from \(\mathbb{R}^J\) and scales them to sum to one.

\[ \text{softmax}(z_j) = \frac{\exp(z_j)}{\sum_{k=1}^{J} \exp(z_k)}, \qquad j = 1,\ldots,J. \] Because the \(J\) outputs must sum to one, it is natural to interpret the outputs as probabilities.

The OJS widget below allows you to experiment with \(J = 4\) inputs and see how the outputted probabilities change.

viewof z1 = Inputs.range([-5, 5], {step: 0.1, value: 1,   label: "z1"});
viewof z2 = Inputs.range([-5, 5], {step: 0.1, value: 0.5, label: "z2"});
viewof z3 = Inputs.range([-5, 5], {step: 0.1, value: -0.5,label: "z3"});
viewof z4 = Inputs.range([-5, 5], {step: 0.1, value: -1,  label: "z4"});

// Softmax pieces
labels = ["Pr(A)","Pr(B)","Pr(C)","Pr(D)"];
zs = [z1, z2, z3, z4];
exps = zs.map(Math.exp);
sumexp = exps.reduce((a, b) => a + b, 0);
ps = exps.map(e => e / sumexp);

// Data for stacked bar (one row, four segments)
data = ps.map((p, i) => ({ label: labels[i], value: p, row: "softmax" }));

// === Two-column layout: sliders (left) + live table (right) ===
{
const container = html`<div style="display:flex; gap:16px; align-items:flex-start;"></div>`;

// Left column: sliders
const left = html`<div style="display:grid; gap:8px; min-width:220px;"></div>`;
left.append(viewof z1, viewof z2, viewof z3, viewof z4);

// Right column: table
const rows = labels.map((cls, i) => ({
Class: cls,
z: zs[i],
"exp(z)": exps[i],
Probability: ps[i]
}));

const right = html`<div style="min-width:260px;"></div>`;
right.append(
Inputs.table(rows, {
columns: ["Class", "z", "exp(z)", "Probability"],
format: {
z: d => d.toFixed(2),
"exp(z)": d => d.toFixed(3),
Probability: d => (d * 100).toFixed(1) + "%"
}
})
);

container.append(left, right);
return container;
}

Plot.plot({
width: 720,
height: 120,
marginLeft: 10,
marginRight: 10,
x: { domain: [0, 1], tickFormat: d => (d * 100).toFixed(0) + "%", label: "Probability", labelArrow: false, labelAnchor: "center" },
y: { axis: null },
color: { legend: true, label: "Class" },
marks: [
Plot.barX(data, Plot.stackX({ x: "value", y: "row", fill: "label", sort: {fill: null} })),
Plot.ruleX([0, 1]) // frame
]
})

I.3 Adding covariates

Now suppose the outcome takes \(J\) possible values:

\[ y_i \in \{1, 2, \ldots, J\}. \]

Example 1: We might label vote choice in the US as (1) abstain, (2) Repbulican, (3) Democrat, or (4) other.

Example 2: We might label coup attempts in a given country-year as (0) none attempted, (1) failed attempt, or (2) successful attempt.

For each category \(j\), define a linear predictor \(\eta_{ij} = X_i \beta_j\). Here, each \(\beta_j\) is a vector of coefficients.

We can use the softmax function to convert these linear predictors into probabilities that sum to one.

\[ \Pr(Y_i = j) = \frac{\exp(\eta_{ij})}{\sum_{k=1}^{J} \exp(\eta_{ik})}, \qquad j = 1, \ldots, J. \]

This is a generalization of the inverse-logit to \(J\) categories. If \(J = 2\), the softmax reduces to the inverse-logit.

However, this model is not identified because adding any constant to each of the \(\eta_{ij}\) produces the same probabilities.

Example Inputs

Input \(z_j\)	Softmax \(p_j\)
\(-1\)	0.0303
\(1\)	0.2242
\(2\)	0.6095
\(0.5\)	0.1360

Adding a Constant (\(+2\))

New Input \(z_j + 2\)	Softmax \(p_j\)
\(-1 + 2 = 1\)	0.0303
\(1 + 2 = 3\)	0.2242
\(2 + 2 = 4\)	0.6095
\(0.5 + 2 = 2.5\)	0.1360

Adding a constant to all inputs leaves the softmax unchanged. This is why the multinomial logit model requires an identification constraint.

To identify the model, we set \(\eta_{iJ} = 1\). Then for \(j = 1, \ldots, J-1\), we have

\[ \Pr(Y_i = j) = \frac{\exp(X_i \beta_j)} {1 + \sum_{k=1}^{J-1} \exp(X_i \beta_k)}. \]

And for the “baseline” category \(J\), we have

\[ \Pr(Y_i = J) = \frac{1} {1 + \sum_{k=1}^{J-1} \exp(X_i \beta_k)}. \]