Statistical Hypothesis Testing

Table of contents

Different Types of Data Analysis
1. Confirmatory Data Analysis
2. Exploratory Data Analysis
Setting Up a Hypothesis
p-Value
Significance Level $\alpha$
1. Rejection Region
  1. One-Tailed Test / One-Sided Test
  2. Two-Tailed Test / Two-Sided Test
2. Statistically Significant
Different Methods for Hypothesis Testing
Hypothesis Testing and Confidence Interval
Plotting Graphs in Hypothesis Testing
1. Error Bars
2. Indicating Statistical Significance
Errors in Hypothesis Testing

Different Types of Data Analysis

Confirmatory Data Analysis

Confirmatory data analysis is when the researcher already has a clear hypothesis and tries to test whether the data confirms or rejects it.

For example, a researcher may hypothesize that a new drug is more effective than a placebo.

Then, the researcher would collect data and perform statistical tests to compare the two groups.

This page focuses on confirmatory data analysis.

Exploratory Data Analysis

Exploratory data analysis is when the researcher does not have a clear hypothesis and tries to find patterns or trends in the data.

May be used to generate hypotheses for confirmatory data analysis.

Setting Up a Hypothesis

Say we want to test the effectiveness of a new drug.

Let $A$ be population of blood pressure of people who take the new drug, and $B$ be the population of blood pressure of people who take a placebo.

For the drug to be effective, we want $\mu_A \ne \mu_B$, where $\mu_A$ and $\mu_B$ are the population means of $A$ and $B$.

Notice that the hypothesis is about the population, not the sample.

In hypothesis testing, we set up two hypotheses:

Null Hypothesis

The null hypothesis is the negation of the hypothesis we want to confirm.

With our example, the null hypothesis is:

$$ \begin{equation} \label{eq:null-h} \tag{Null Hypothesis} \mu_A = \mu_B \end{equation} $$

This would mean that the drug is not effective.

Alternative Hypothesis

The alternative hypothesis is the hypothesis we want to confirm.

With our example, the alternative hypothesis is:

$$ \begin{equation} \label{eq:alt-h} \tag{Alternative Hypothesis} \mu_A \ne \mu_B \end{equation} $$

This would mean that the drug is effective.

Proof by Contradiction

Hypothesis testing takes the following approach:

Set up the null hypothesis and alternative hypothesis
Assume a distribution where the null hypothesis is true
Calculate a test statistic from the real sample data
Test how this statistic fits into this assumed distribution
If the real data statistic is unlikely when the null hypothesis is true, then we reject the null hypothesis
By rejecting the null hypothesis, we accept the alternative hypothesis
Otherwise, we fail to reject the null hypothesis

Notice the wording: fail to reject the null hypothesis. Failing to reject does not equal accepting the null hypothesis or rejecting the alternative hypothesis.

p-Value

The $p$-value is:

The probability of observing the data in a distribution that assumes the null hypothesis is true.

So if the p-value of a real data is low, then the data is unlikely to occur when the null hypothesis is true. Which in turn means that the null hypothesis is unlikely to be true.

So the next question is:

What is considered a low $p$-value?

Significance Level $\alpha$

Whether we reject the null hypothesis or not depends on the significance level, commonly denoted by $\alpha$.

Most commonly used significance level is $\alpha = 0.05$.

Rejection Region

The rejection region $R$ refers to the area on the left and right tails of the distribution that we would reject the null hypothesis.

More formally

\[R = \{ x \mid T(x) < -c \lor T(x) > c \}\]

Where $T(x)$ is the test statistic and $c$ is the critical value.

For $\alpha = 0.05$, the rejection region would be on each side of the distribution, 2.5% each.

One-Tailed Test / One-Sided Test

One-tailed test is when we reject the null hypothesis by considering only one tail of the distribution, and thus $\alpha / 2$.

Two-Tailed Test / Two-Sided Test

Two-tailed test is when we reject the null hypothesis by considering both tails of the distribution, and thus the whole $\alpha$.

Unless there is a reason to use one-tailed test, it is much more common to use two-tailed test.

Statistically Significant

If the $p$-value is less than $\alpha$, then the data is statistically significant.

$$ p < \alpha $$

Then we reject the null hypothesis and accept the alternative hypothesis.

Otherwise, we have no grounds to reject the null hypothesis.

Different Methods for Hypothesis Testing

Go to page

Hypothesis Testing and Confidence Interval

The relationship between confidence interval and hypothesis testing is like a mirror image.

$$ \text{Significance Level}\; \alpha = 1 - \text{Confidence Level} $$

With a 95 CI, we infer a likely (with 95% chance) range of the population statistic based on the sample statistic computed from real data.

Then we test if our null hypothesis fits into this range.

For instance, with our example null hypothesis, we can calculate a 95 CI for $\mu_A - \mu_B$, and see if it contains 0.

If not, then we reject the null hypothesis.

On the other hand, with hypothesis testing, we first assume that the null hypothesis is true, and then calculate a sample statistic from the real data.

Then we test how this sample statistic fits into this assumed null hypothesis.

For instance, with our example null hypothesis, we obtain a distribution of $\mu_A - \mu_B$ of which the mean is 0.

Then we calculate the probability $p$ of observing the real data, and see if it is less than $\alpha = 0.05$.

If so, then we reject the null hypothesis.

Plotting Graphs in Hypothesis Testing

Error Bars

With bar plots and scatter plots, you’ll often see error bars.

It is important to describe in the legends which metric is used for the error bars, because it can mean different things.

Suppose we have a bar plot with sample mean as the variable,

Standard deviation
- The error bars represent the dispersion of the sample
- Does not represent a probability
Standard error of the mean
- The error bars represent the probability of the sample mean
Confidence interval

Indicating Statistical Significance

In figures and charts, we commonly use asterisks ($*$) to indicate statistical significance.

Although the figure below does not, make sure to indicate what the asterisks mean in the legends.

Commonly used symbols are:

$p < 0.05$: $*$
$p < 0.01$: $**$
$p < 0.001$: $***$
Non-significant: $\text{N.S.}$

Errors in Hypothesis Testing

Go to page