Encoding Categorical Variables
Categorical or qualitative variables are often non-numeric and must be encoded to be used in learning.
Table of contents
One-Hot Encoding
For a qualitative variable with
For example, consider a variable

Dummy Variable Trap
There is one issue with one-hot encoding: the dummy variable trap.
If you look at the encoding, you can see that the third variable is redundant:

Because we could easily infer C by the encoding of A and B alone: 00.
This is a problem because now we have multicollinearity between our features.
Most one-hot encoding will have a parameter to automatically drop one of the dummy variables.
Dummy Encoding
Dummy encoding introduces one less binary variable than one-hot encoding, i.e.,

Each dummy represents
,
and
This avoids the dummy variable trap.
Baseline
With dummy encoding, one group is chosen as the baseline (in our example, C).
Say
In simple linear regression:
When the categorial variable is
We are left with the baseline model.
Interpretation of Dummy Coefficients
In linear regression, the coefficients
The average effect on
But how do we interpret the coefficients of dummy variables?
The effect is in comparison to the baseline.
So in our example above:
where
is the expected when and (C) is the average effect on when (A) compared to C is the average effect on when (B) compared to C
It does not give you comparison between the effects of A and B.
Testing for Significance
After estimating
Calculate the
Remember that we are comparing each group to the baseline (is there a difference between A and C? B and C?).
You cannot compare the significance of arbitrary