Bayes’ Theorem

Table of contents
  1. Marginal Probability
  2. The Law of Total Probability
    1. Marginalization
  3. Interpreting Bayes’ Theorem
    1. Prior
    2. Posterior
    3. Marginal

Marginal Probability

A marginal probability is the probability of an event occurring, irrespective of other events.

So simply put, it is the probability of an event without any conditions.

This term is used in contrast to a conditional probability, which is the probability of an event that depends on other events.


The Law of Total Probability

The law of total probability is a theorem that relates marginal probabilities to conditional probabilities.

If $A_i$ is a partition of a sample space $\Omega$, then for any event $B$ in $\Omega$,

$$ P(B) = \sum_{i} P(B|A_i)P(A_i) $$

Marginalization

Marginalization is the process of summing over all possible values of a variable to find the marginal distribution of another variable.

$$ P(B) = \sum_{i} P(B, A_i) $$

So it’s basically the same idea as the law of total probability, but without the conditionals.


Interpreting Bayes’ Theorem

Let $A_i$ be a partition of a sample space $\Omega$.

Then for any event $B$ in $\Omega$, Bayes’ theorem states that:

$$ P(A_i|B) = \frac{P(B|A_i)P(A_i)}{P(B)} = \frac{P(B|A_i)P(A_i)}{\sum_{j} P(B|A_j)P(A_j)} $$

Bayes' Theorem

In Bayesian statistics, probability is interpreted as a measure of belief.

So this rule tells us how our beliefs are updated given new observations.

Prior

A prior is a probability distribution that represents our beliefs prior to observing any new data.

So in our case $P(A_i)$ is the prior.

In Bayesian inference, the prior is often initialized to a uniform distribution as a non-informative prior (i.e. no prior knowledge, less bias).

Posterior

A posterior is a probability distribution that represents our beliefs after observing new data.

In our case $P(A_i | B)$ is the posterior.

The new observation is represented by $B$, and $P(A_i | B)$ is our updated belief about $A_i$.

Marginal

In real life, we often have no direct way to solve for $P(B)$.

In this case, the entire posterior is approximated using methods such as MCMC (Markov Chain Monte Carlo).