Commonly Used Distributions
Table of contents
Discrete Random Variables
Discrete Uniform Distribution
A random variable $X$ is said to have a discrete uniform distribution with parameters $a, b$, defining a range of $[a,b]$ where $n = b-a+1$, if its PMF is given by:
$$ P(X=x) = \frac{1}{n} \quad \text{for} \quad x \in [a,b] $$
We denote this as $X \sim \text{DiscreteUniform}(a,b)$.
- Mean: $\frac{a+b}{2}$
- Variance: $\frac{n^2 - 1}{12}$
Bernoulli Distribution
A random variable $X$ is said to have a Bernoulli distribution with parameter $p \in [0,1]$ if its PMF is given by:
$$ P(X=x) = p^x (1-p)^{1-x} \quad \text{for} \quad x \in \{0,1\} $$
We denote this as $X \sim \text{Bernoulli}(p)$.
Intuitively, it is saying what are the odds of getting a $1$ (success) or $0$ (failure) in a single trial, which is $p$ and $1-p$ respectively by design.
- Mean: $p$
- Variance: $p(1-p)$
Binomial Distribution
A random variable $X$ is said to have a binomial distribution with parameters $n \in \mathbb{N}$ and $p \in [0,1]$ if its PMF is given by:
$$ P(X=x) = \binom{n}{x} p^x (1-p)^{n-x} \quad \text{for} \quad x \in \{0,1,\dots,n\} $$
We denote this as $X \sim \text{Binomial}(n,p)$.
Intuitively, it is saying what are the odds of getting $x$ successes in $n$ Bernoulli trials.
- Mean: $np$
- Variance: $np(1-p)$
Geometric Distribution
A random variable $X$ is said to have a geometric distribution with parameter $p \in [0,1]$ if its PMF is given by:
$$ P(X=x) = p(1-p)^{x-1} \quad \text{for} \quad x \geq 1 $$
We denote this as $X \sim \text{Geometric}(p)$.
Intuitively, it is saying what are the odds of getting $x$ failures before the first success.
- Geometric distribution is memoryless.
Poisson Distribution
A random variable $X$ is said to have a Poisson distribution with parameter $\lambda > 0$ if its PMF is given by:
$$ P(X=x) = e^{-\lambda} \frac{\lambda^x}{x!} \quad \text{for} \quad x \geq 0 $$
We denote this as $X \sim \text{Poisson}(\lambda)$.
Intuitively, it is saying what are the odds of getting $x$ rare events in a given time interval, where $\lambda$ is the average number of events per interval.
- Mean: $\lambda$
- Variance: $\lambda$
Relation to Binomial Distribution
Poisson distribution is useful when there is no upper bound on the number of data (such as the $n$ in binomial distribution).
When $n$ is large and $p$ is small, the binomial distribution can be approximated by the Poisson distribution. To be more specific, a random variable $X \sim \text{Binomial}(n,p)$, can be approximated with $X \sim \text{Poisson}(\lambda)$ where $\lambda = np$.
Sum of Poisson Random Variables
If $X_1 \sim \text{Poisson}(\lambda_1)$ and $X_2 \sim \text{Poisson}(\lambda_2)$, then:
$$ X_1 + X_2 \sim \text{Poisson}(\lambda_1 + \lambda_2) $$
Continuous Random Variables
Uniform Distribution
A random variable $X$ is said to have a uniform distribution with parameters $a, b$, defining a range of $[a,b]$, if its PDF is given by:
$$ f(x) = \frac{1}{b-a} \quad \text{for} \quad x \in [a,b] $$
We denote this as $X \sim \text{Uniform}(a,b)$.
- Mean: $\frac{a+b}{2}$
- Variance: $\frac{(b-a)^2}{12}$
Normal Distribution
Exponential Distribution
A random variable $X$ is said to have an exponential distribution with parameter $\beta > 0$ if its PDF is given by:
$$ f(x) = \frac{1}{\beta} e^{-x/\beta} \quad \text{for} \quad x > 0 $$
We denote this as $X \sim \text{Exp}(\beta)$.
With Lambda
Same idea, but some people use $\lambda = 1/\beta$ as the parameter instead:
\[f(x) = \lambda e^{-\lambda x} \quad \text{for} \quad x > 0\]Intuitively, it is measuring the distance or time between rarer events that occur at a constant average rate $\beta$.
- Exponential distribution is memoryless.
$\chi^2$ Distribution
A random variable $X$ is said to have a chi-squared distribution with parameter $k$ degrees of freedom if its PDF is given by:
$$ f(x) = \frac{1}{2^{k/2} \Gamma(k/2)} x^{k/2 - 1} e^{-x/2} \quad \text{for} \quad x > 0 $$
We denote this as $X \sim \chi^2_k$.
Note
You rarely need to know the exact formula.
It suffices to know that when $Z_i \sim \text{Normal}(0,1)$ where $i = 1,2,\dots,k$ are independent random variables,
\[Q = \sum_{i=1}^k Z_i^2 \sim \chi^2_k\]The sum of squares of $k$ independent standard normal random variables follows a chi-squared distribution with $k$ degrees of freedom.
Chi-squared distribution is often used as test statistics in hypothesis testing.