Sample Mean and Variance
Table of contents
Sample Mean
For $X1, \dots, X_n$ IID random variables, the sample mean is defined as:
$$ \overline{X}_n = \frac{1}{n} \sum_{i=1}^n X_i $$
Converges in Probability to Population Mean
The weak law of large numbers states that sample mean $\overline{X}_n$ converges in probability to the population mean $\mu$ as number of samples $n$ increases.
\[\overline{X}_n \xrightarrow{P} \mu\]Unbiased Estimator of Population Mean
When $\E[X_i] = \mu$,
The expected value of the sample mean can be easily calculated using linearity of expectation:
$$ \E[\overline{X}_n] = \E\left[\frac{1}{n} \sum_{i=1}^n X_i\right] = \frac{1}{n} \sum_{i=1}^n \E[X_i] = \mu $$
So $\overline{X}_n$ is an unbiased estimator of the population mean $\mu$.
Consistent Estimator of Population Mean
Since the sample mean converges in probability to the population mean, the sample mean is said to be consistent.
Variance of the Sample Mean
When $\Var(X_i) = \sigma^2$,
The variance of the sample mean can be easily calculated using this property of variance:
$$ \Var(\overline{X}_n) = \Var\left(\frac{1}{n} \sum_{i=1}^n X_i\right) = \frac{1}{n^2} \sum_{i=1}^n \Var(X_i) = \frac{\sigma^2}{n} $$
The variance of the sample mean decreases as the sample size increases, which matches the intuition that the sample mean becomes more accurate.
Sample Variance
Do not confuse sample variance with variance of the sample mean above.
For $X1, \dots, X_n$ IID random variables, the (unbiased) sample variance is defined as:
$$ S_n^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \overline{X}_n)^2 $$
Why $n-1$ in the denominator?
You may be wondering why we divide by $n-1$ instead of $n$.
Refer to this link for details. But in short: it is to make the sample variance an unbiased estimator of the population variance.
Division by $n$ is good enough when we’re only measuring the dispersion in descriptive statistics, but when we’re using the statistic to estimate the population parameter in inferential statistics, it results in an underestimation of the population variance/standard deviation.
Converges in Probability to Population Variance
The sample variance converges in probability to the population variance $\sigma^2$ as number of samples $n$ increases.
\[S_n^2 \xrightarrow{P} \sigma^2\]Since square root is a continuous function, by the property of convergence in probability,
\[S_n \xrightarrow{P} \sigma\]holds as well.
Unbiased Estimator of Population Variance
When $\E[X_i] = \mu$ and $\Var(X_i) = \sigma^2$,
The expected value of the sample variance is:
$$ \E[S_n^2] = \sigma^2 $$
So $S^2$ is an unbiased estimator of the population variance $\sigma^2$.
Consistent Estimator of Population Variance
Since the sample variance converges in probability to the population variance, the sample variance is said to be consistent.
Standard Error of the Sample Mean
Standard error of the sample mean (SEM) is the standard deviation of the sample mean.
“Standard error” does not always relate to the sample mean. This term is used to describe the standard deviation of any statistic.
We calculated above that:
\[\Var[\overline{X}_n] = \frac{\sigma^2}{n}\]And thus standard error should be:
\[\text{SEM} = \sqrt{\Var[\overline{X}_n]} = \frac{\sigma}{\sqrt{n}}\]However, in many cases, population standard deviation $\sigma$ is unknown. So, we use the sample standard deviation $S_n$ from above to estimate the standard error:
$$ \text{SEM} \approx \frac{S_n}{\sqrt{n}} $$
We already mentioned that $S_n \xrightarrow{P} \sigma$ above.