Non-Parametric Inference

Parametric inference assumes a specific distribution for the data. So all we have to do is estimate the parameters of the distribution.

Non-parametric inference does not assume a specific distribution for the data, so we have to estimate the distribution itself (CDF).

Why do we want to know the CDF?

It is because many statistics are functionals, or functions of the CDF. So if we know the CDF, we can estimate the statistics.

Table of contents
  1. Empirical Distribution Function
    1. Relations to True CDF
  2. Statistical Functional
    1. Linear Functional
  3. Plug-In Estimator
    1. For Linear Functional
    2. Finding the Sample Distribution of Plug-In Estimator

Empirical Distribution Function

When we have IID samples $X_1, \dots, X_n \sim F$, we estimate the CDF $F$ by giving mass $1/n$ at each data point.

Think of it as guessing a distribution with a cumulative histogram.

The empirical distribution function, also called the empirical CDF (eCDF), is defined as:

$$ \hat{F}_n(x) = \frac{1}{n} \sum_{i=1}^n I(X_i \leq x) $$

where $I(\cdot)$ is the indicator function:

\[I(A) = \begin{cases} 1 & \text{if } A \text{ is true} \\ 0 & \text{if } A \text{ is false} \end{cases}\]

So basically, sum over how many data points are less than or equal to $x$, and give them mass $1/n$ each.

Empirical distribution function is a discrete step function.

The expected value of indicator function

One thing to note is:

$$ E[I(X_i \leq x)] = \Pr(X_i \leq x) $$

Proof is trivial because the only that gets left in the calculation is $1 \cdot \Pr(X_i \leq x)$.

Relations to True CDF

Expected value:

\[\E(\hat{F}_n(x)) = F(x)\]

Variance:

\[\Var(\hat{F}_n(x)) = \frac{F(x)(1 - F(x))}{n}\]

Convergence in probability:

\[\hat{F}_n(x) \xrightarrow{P} F(x)\]

$\hat{F}_n(x)$ is an unbiased and consistent estimator of $F(x)$.


Statistical Functional

A statistical functional is a function of the CDF $F$. It maps a distribution to a real number or vector.

$$ T(F) $$

Mean, variance, median

Mean, variance, median are all statistical functionals.

\[\begin{align*} \mu &= T(F) = \int x dF(x) \\ \sigma^2 &= T(F) = \int x^2 dF(x) - (\int x dF(x))^2 \\ \text{median} &= T(F) = F^{-1}(\frac{1}{2}) \end{align*}\]

Linear Functional

For some $r(x)$,

If $T(F) = \int r(x) dF(x)$, then $T(F)$ is a linear functional.

$\int r(x) dF(x)$ is equivalent to $\E[r(X)]$. Since $\E$ is a linear operator, it makes sense that $T(F)$ is linear.

As the name suggests, a linear functional satisfies the following:

$$ T(aF + bG) = aT(F) + bT(G) $$


Plug-In Estimator

When $\theta = T(F)$ is a statistical functional, the plug-in estimator of $\theta$ is:

$$ \hat{\theta}_n = T(\hat{F}_n) $$

Because we don’t know $F$, we’re just plugging in the empirical distribution $\hat{F}_n$ instead.

For Linear Functional

If $T(F) = \int r(x) dF(x)$, then the plug-in estimator is:

$$ T(\hat{F}_n) = \int r(x) d\hat{F}_n(x) = \frac{1}{n} \sum_{i=1}^n r(X_i) $$

Finding the Sample Distribution of Plug-In Estimator

See Bootstrap