Stop Words
Table of contents
Stop Words / Stopwords
- Low semantic significance
- High frequency
- Often filtered out in NLP tasks
Generic Stop Words
In english, common stop words include:
a
,an
,the
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
stop_words_list = stopwords.words('english')
NLTK library defines 179 stop words. Some of the examples are:
['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're"]
Domain-Specific Stop Words
Words deemed irrelevant in a specific domain.
When to Remove Stop Words
- Improve computational efficiency
- Reduce noise in the data
- Remove when only the general sense of the corpus is needed
References: