Non-Parametric Inference
Parametric inference assumes a specific distribution for the data. So all we have to do is estimate the parameters of the distribution.
Non-parametric inference does not assume a specific distribution for the data, so we have to estimate the distribution itself (CDF).
Why do we want to know the CDF?
It is because many statistics are functionals, or functions of the CDF. So if we know the CDF, we can estimate the statistics.
Table of contents
Empirical Distribution Function
When we have IID samples $X_1, \dots, X_n \sim F$, we estimate the CDF $F$ by giving mass $1/n$ at each data point.
Think of it as guessing a distribution with a cumulative histogram.
The empirical distribution function, also called the empirical CDF (eCDF), is defined as:
$$ \hat{F}_n(x) = \frac{1}{n} \sum_{i=1}^n I(X_i \leq x) $$
where $I(\cdot)$ is the indicator function:
\[I(A) = \begin{cases} 1 & \text{if } A \text{ is true} \\ 0 & \text{if } A \text{ is false} \end{cases}\]So basically, sum over how many data points are less than or equal to $x$, and give them mass $1/n$ each.
Empirical distribution function is a discrete step function.
The expected value of indicator function
One thing to note is:
$$ E[I(X_i \leq x)] = \Pr(X_i \leq x) $$
Proof is trivial because the only that gets left in the calculation is $1 \cdot \Pr(X_i \leq x)$.
Relations to True CDF
Expected value:
\[\E(\hat{F}_n(x)) = F(x)\]Variance:
\[\Var(\hat{F}_n(x)) = \frac{F(x)(1 - F(x))}{n}\]Convergence in probability:
\[\hat{F}_n(x) \xrightarrow{P} F(x)\]$\hat{F}_n(x)$ is an unbiased and consistent estimator of $F(x)$.
Statistical Functional
A statistical functional is a function of the CDF $F$. It maps a distribution to a real number or vector.
$$ T(F) $$
Mean, variance, median
Mean, variance, median are all statistical functionals.
\[\begin{align*} \mu &= T(F) = \int x dF(x) \\ \sigma^2 &= T(F) = \int x^2 dF(x) - (\int x dF(x))^2 \\ \text{median} &= T(F) = F^{-1}(\frac{1}{2}) \end{align*}\]Linear Functional
For some $r(x)$,
If $T(F) = \int r(x) dF(x)$, then $T(F)$ is a linear functional.
$\int r(x) dF(x)$ is equivalent to $\E[r(X)]$. Since $\E$ is a linear operator, it makes sense that $T(F)$ is linear.
As the name suggests, a linear functional satisfies the following:
$$ T(aF + bG) = aT(F) + bT(G) $$
Plug-In Estimator
When $\theta = T(F)$ is a statistical functional, the plug-in estimator of $\theta$ is:
$$ \hat{\theta}_n = T(\hat{F}_n) $$
Because we don’t know $F$, we’re just plugging in the empirical distribution $\hat{F}_n$ instead.
For Linear Functional
If $T(F) = \int r(x) dF(x)$, then the plug-in estimator is:
$$ T(\hat{F}_n) = \int r(x) d\hat{F}_n(x) = \frac{1}{n} \sum_{i=1}^n r(X_i) $$