Language Model & n-gram
Table of contents
Language Modeling
It is the task of predicting what word should come next.
We are modeling the probability distribution of:
Where each
A language model performs this task.
n-gram Language Model
Before neural networks, the most common language model was the n-gram model.
n-gram
n-gram is a consecutive sequence of
We collect different n-grams from a corpus.
Markov Assumption
Remember that the Markov assmption states that only the current state matters to predict the next state, and previous states are irrelevant.
When word
Using conditional probability, the above becomes the probability of an
And each of the probabilities on the right side is calculated by counting the
Sparsity Problem
Problem occurs when no such
As
When the numerator is zero, we could introduce smoothing, where you add small
When the denominator is zero, we could use backoff, where you use even shorter
Larger
Storage Problem
Counting all