Basic Probability Theory

Table of contents
  1. Sample Spaces and Events
    1. Mutually Exclusive
    2. Partition
  2. Probability
    1. Interpretation of Probability
  3. Random Variable
  4. Probability Distribution
    1. Discrete Probability Distribution
      1. Probability Mass Function
    2. Continuous Probability Distribution
      1. Probability Density Function
    3. Cumulative Distribution Function
      1. Properties of CDF
  5. Quantile Function
  6. Expected Value
    1. Expected Value of Discrete Random Variable
    2. Expected Value of Continuous Random Variable
  7. Variance and Standard Deviation
    1. Variance of Discrete Random Variable
    2. Variance of Continuous Random Variable
  8. Joint Probability
    1. Independent Random Variables
    2. Conditional Probability
  9. Common Probability Distributions
    1. Normal Distribution (Gaussian Distribution)
      1. Characteristics of Normal Distribution
    2. Standard Normal Distribution
      1. Z-Score
      2. Z-Score Table
    3. Others

Sample Spaces and Events

The sample space is the set of all possible outcomes of an experiment.

It is often denoted by $\Omega$.

Elements $\omega \in \Omega$ are called realizations or outcomes.

An event is a subset of the sample space.

Mutually Exclusive

Let $A_i$ be events. Then $A_i$ are mutually exclusive or disjoint if:

$$ A_i \cap A_j = \emptyset \quad \forall i \neq j $$

Disjoint events are not the same as independent events.

Partition

A partition of a sample space $\Omega$ is a set of mutually exclusive events whose union is $\Omega$:

$$ \bigcup_{i} A_i = \Omega $$


Probability

Probability is a function that maps an event to a real number: $P: \Omega \rightarrow \mathbb{R}$.

In general, we use the following notation for probability of an event $A$:

$$ P(A) $$

Interpretation of Probability

  • Frequentist interpretation: The probability of an event is the limit of the relative frequency of the event as the number of trials goes to infinity.
  • Bayesian interpretation: The probability of an event is the degree of belief.

Random Variable

Let $X$ be a random variable.

Probability of $X$ taking a value $x$ is denoted by:

$$ P(X = x) $$


Probability Distribution

Let $X$ be a random variable.

A probability distribution is a function that maps each value of $X$ to its probability.

In inferential statistics, we assume a certain probability distribution for the population. Then the samples become random variables that follow the same probability distribution.

Discrete Probability Distribution

The probability distribution of a discrete random variable $X$, is essentially the probability of each value of $X$.

Probability Mass Function

The probability mass function (PMF) of a discrete variable $X$ is denoted by:

$$ p_{X}(x) $$

This function maps each value of $X$ to its exact probability.

For discrete random variables, the PMF is the probability distribution.

$$ P(X = x) = p_{X}(x) $$

Continuous Probability Distribution

The probability distribution of a continuous random variable $X$, is the area under the curve of the probability density function (PDF) of $X$ for a given interval.

Probability Density Function

The probability density function (PDF) of a continuous variable $X$ is denoted by

$$ f_{X}(x) $$

The absolute probability of $X$ taking a specific value is zero, since there are infinitely many values that $X$ can take.

Therefore, the PDF of $X$ is not an absolute probability, but a relative probability per unit range.

Let $f_{X}(x)$ be the PDF of $X$. Then the probability that $X$ takes a value in the interval $[a, b]$ is:

$$ P(a \leq X \leq b) = \int_{a}^{b} f_{X}(x) dx $$

Cumulative Distribution Function

The cumulative distribution function (CDF) of a random variable $X$ is a non-decreasing function denoted by:

$$ F_{X}(x) = P(X \leq x) $$

CDF contains all the information about the probability distribution of $X$.

Therefore, for two random variables $X$ and $Y$, if their CDFs are the same, i.e.:

\[F_{X}(x) = F_{Y}(x) \quad \forall x \in \mathbb{R}\]

then $X$ and $Y$ have the same probability distribution.

Properties of CDF

  • $P(X = x) = F_{X}(x) - F_{X}(x^{-})$ where $x^{-}$ is the largest value less than $x$ (left-limit)
  • $P(a < X \leq b) = F_{X}(b) - F_{X}(a)$
  • $P(X > x) = 1 - F_{X}(x)$

Quantile Function

Let $X$ be a random variable with CDF $F_{X}(x)$.

The quantile function or inverse CDF of $X$ is:

$$ F_{X}^{-1}(q) = \inf \{ x : F_{X}(x) > q \} $$

where $q \in [0, 1]$.


Expected Value

The expected value of a quantitative random variable $X$ is denoted by:

$$ E(X) $$

It is a weighted average of the values of $X$, where the weights are the probabilities of each value.

Expected Value of Discrete Random Variable

Let $X$ be a discrete random variable. Then:

$$ E(X) = \sum_{x} x \cdot p_{X}(x) $$

Expected Value of Continuous Random Variable

Let $X$ be a continuous random variable. Then:

$$ E(X) = \int x \cdot f_{X}(x) dx $$

The range of the integral is the entire range of $X$.


Variance and Standard Deviation

Variance and standard deviation measures how much the values of a random variable $X$ are spread out from the expected value.

Standard deviation is the square root of variance.

Variance of Discrete Random Variable

Let $X$ be a discrete random variable. Then:

$$ V(X) = \sum_{x} (x - E(X))^{2} \cdot p_{X}(x) $$

Variance of Continuous Random Variable

Let $X$ be a continuous random variable. Then:

$$ V(X) = \int (x - E(X))^{2} \cdot f_{X}(x) dx $$

Some interesting properties of the variance/standard deviation:

  • $V(X) \geq 0$
  • $V(X) = 0$ if and only if $X$ is a constant
  • Higher the value, higher the dispersion from the expected value

Joint Probability

Let $X$ and $Y$ be two random variables.

The joint probability of $X$ and $Y$ is denoted by:

$$ P(X, Y) $$

Also $P(X \cap Y)$, $P(XY)$.

Independent Random Variables

Two random variables $X$ and $Y$ are independent if:

$$ P(X, Y) = P(X) \cdot P(Y) $$

Conditional Probability

The conditional probability of $X$ given $Y$ is denoted by $P(X | Y)$.

$$ P(X | Y) = \frac{P(X, Y)}{P(Y)} $$

You could say that the sample space has changed from $\Omega$ to $Y$.

$P(X | Y) \neq P(Y | X)$ in general.

If $X$ and $Y$ are independent, then:

$$ P(X | Y) = P(X) $$


Common Probability Distributions

As described earlier, we assume a certain probability distribution for the population.

Each probability distribution has its own set of parameters.

These parameters determine the shape of the distribution.

Therefore, knowing the parameters is equivalent to understanding the population.

Below are some common probability distributions and their parameters.

Normal Distribution (Gaussian Distribution)

The normal distribution, also known as the Gaussian distribution, is a continuous probability distribution.

It is the most common distribution in statistics.

The normal distribution is characterized by two parameters:

  • $\mu$: Mean (location of distribution)
  • $\sigma$: Standard deviation (width of distribution)

The notation for the normal distribution is:

$$ X \sim N(\mu, \sigma^{2}) $$

The PDF of the normal distribution is:

$$ f_{X}(x) = \frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{-\frac{(x - \mu)^{2}}{2 \sigma^{2}}} $$

Hard to see the exponent in the PDF?

$$ -\frac{(x - \mu)^{2}}{2 \sigma^{2}} $$

Characteristics of Normal Distribution

  • Symmetric around the mean
  • Mean, median, and mode are equal
Standard deviation diagram
  • 68% of the values are approximately within $\mu \pm 1 \sigma$
  • 95% of the values are approximately within $\mu \pm 2 \sigma$
  • 99.7% of the values are approximately within $\mu \pm 3 \sigma$

Standard Normal Distribution

The normal distribution of $x$ against the probability density transformed into $z$ against the probability density is the standard normal distribution.

We can denote the standard normal distribution as:

$$ Z \sim N(0, 1) $$

Z-Score

Standardization or z-score normalization is the process of normalizing a dataset so that

$$ \begin{align*} \mu &= 0 \\ \sigma &= 1 \end{align*} $$

We do this by obtaining the standard score or z-score of each value by:

$$ z = \frac{x - \mu}{\sigma} $$

Standard Normal Distribution

Z-score is essentially calculating how many $\sigma$ a value is away from the mean.
Higher the z-score, higher the value is from the mean.

Z-Score Table

Below is a part of the z-score table.

This table specifically, is for the left-tailed z-score. It shows the probability of a value being less than a certain z-score. Since the normal distribution is symmetric, the logic can be applied to the right-tailed z-score as well.

Z.00.01.02.03.04.05.06.07.08.09
-2.1.01786.01743.01700.01659.01618.01578.01539.01500.01463.01426
-2.0.02275.02222.02169.02118.02068.02018.01970.01923.01876.01831
-1.9.02872.02807.02743.02680.02619.02559.02500.02442.02385.02330

Let’s use the z-score of 1.96 (absolute) as an example.

  1. Find the row with the z-score of $1.9$
  2. Find the column with the z-score of $0.06$
  3. You can see that the probability is $0.025$

Since the normal distribution is symmetric, we know that the probability of a value being greater than $1.96$ is also $0.025$.

Hence, the probability of a z-score falling outside of $\pm 1.96$ is $0.05$.

Others

To be added

Other common probability distributions include:

  • Discrete Uniform distribution (discrete)
  • Binomial distribution (discrete)
  • Poisson distribution (discrete)
  • Geometry distribution (discrete)
  • Negative binomial distribution (discrete)
  • Uniform distribution (continuous)
  • Exponential distribution (continuous)
  • etc.