Logistic Regression

Table of contents
  1. Quick Recap of Logistic Function and Logit
  2. Binary Logistic Regression
    1. Interpretation of the Coefficients
  3. MLE for Logistic Regression
    1. Testing for Estimator Significance
  4. Multinomial Logistic Regression
  5. Softmax Coding

Quick Recap of Logistic Function and Logit

See here


Binary Logistic Regression

Let $Y$ a binary response variable.

We want to know $p(X)$ be the probability that $Y = 1$ given $X$:

$$ p(X) = P(Y = 1 \mid X) $$

In this case, our baseline class is $Y = 0$.

In logistic regression, we model $p(X)$ using the logistic function:

$$ p(X; \beta) = \frac{e^{f(X;\, \beta)}}{1 + e^{f(X;\, \beta)}} $$

Where $f(X; \beta)$ is a linear function of $X$ (also the logit or log-odds of sigmoid $p(X)$):

$$ f(X; \beta) = \beta_0 + \beta_1 X_1 + \ldots + \beta_p X_p $$

So logistic regression is still a linear model.

Interpretation of the Coefficients

  • A positive $\hat{\beta}_j$ means that $X_j$ is associated with higher probability of $Y = 1$.
  • One unit increase in $X_j$ increases the log-odds of $Y = 1$ versus $Y = 0$ by $\hat{\beta}_j$.
    • Because $f(X; \beta)$ is the the logit (log-odds) of $p$ (probability of $Y=1$).

MLE for Logistic Regression

To estimate $\beta$ we use maximum likelihood estimation (MLE).

The likelihood function is:

$$ \mathcal{L}(\beta) = \prod_{i=1}^n p(x_i; \beta)^{y_i} (1 - p(x_i; \beta))^{1 - y_i} $$

The parameter $\beta$ should maximize $p$ when $y = 1$, maximize $1 - p$ when $y = 0$, and thus maximize the likelihood.

The log-likelihood function is:

\[\begin{align*} \ell(\beta) &= \sum_{i=1}^n \left[ y_i \log p(x_i; \beta) + (1 - y_i) \log (1 - p(x_i; \beta)) \right] \\[0.5em] &= \sum_{i=1}^n \left[ y_i f(x_i; \beta) - \log(1 + e^{f(x_i; \beta)}) \right] \end{align*}\]
Derivation

Remember that logit is:

\[f(x_i; \beta) = \log\left(\frac{p(x_i; \beta)}{1 - p(x_i; \beta)}\right) = \log(p(x_i; \beta)) - \log(1 - p(x_i; \beta))\]

And

\[\log(1 - p(x_i; \beta)) = \log\left(\frac{1}{1 + e^{f(x_i; \beta)}}\right) = -\log(1 + e^{f(x_i; \beta)})\]

Then:

\[\begin{align*} &y_i \log p(x_i; \beta) + (1 - y_i) \log (1 - p(x_i; \beta)) \\[0.5em] =\; &y_i \log p(x_i; \beta) - y_i \log(1 - p(x_i; \beta)) + \log(1 - p(x_i; \beta)) \\[0.5em] =\; &y_i f(x_i; \beta) - \log(1 + e^{f(x_i; \beta)}) \end{align*}\]

We take the derivative of $\ell(\beta)$ with respect to $\beta$ and solve for $\beta$:

\[\frac{\partial \ell(\beta)}{\partial \beta} = 0\]

Then we use iterative methods to get $\hat{\beta}$.

There is no closed-form solution for $\hat{\beta}$.

Testing for Estimator Significance

Remember that MLE estimators are asymptotically normal.

Therefore, you can test for the significance of the estimators using the z-statistic:

$$ z = \frac{\hat{\beta}_j}{\text{SE}(\hat{\beta}_j)} $$


Multinomial Logistic Regression

When the response variable has more than two classes, let’s say $K$ classes, we use multinomial logistic regression.

The probability model is:

$$ P(Y = k \mid X; \beta) = \frac{e^{\beta_{k0} + \beta_k^T X}} {1 + \sum_{l=1}^{K-1} e^{\beta_{l0} + \beta_l^T X}} $$

You will see that we actually have $K-1$ equations, because we usually set the class $K$ as the baseline where $\beta_{K0} = 0$ and $\beta_{K} = \mathbf{0}$.


Softmax Coding

In multinomial logistic regression, we usually set the last class $K$ as the baseline, where all $\beta_{K0} = 0$ and $\beta_K = \mathbf{0}$.

But with softmax coding, we do not set any class as the baseline.

In softmax coding, our log-odds (or logit function) of class $k$ versus any class $k’$ is:

$$ f(x; \beta) = (\beta_{k0}-\beta_{k'0}) + (\beta_{k1} - \beta_{k'1})x_1 + \ldots (\beta_{kp} - \beta_{k'p})x_p $$

While in multinomial logistic regression, our log-odds (or logit function) was always the class $k$ in question versus the baseline class $K$ where all the coefficients are zero:

$$ f(x; \beta) = \beta_{k0} + \beta_{k1}x_1 + \ldots + \beta_{kp}x_p $$