Stop Words

Table of contents
  1. Stop Words / Stopwords
    1. Generic Stop Words
    2. Domain-Specific Stop Words
  2. When to Remove Stop Words

Stop Words / Stopwords

  • Low semantic significance
  • High frequency
  • Often filtered out in NLP tasks

Generic Stop Words

In english, common stop words include:

  • a, an, the
import nltk
from nltk.corpus import stopwords

nltk.download('stopwords')
stop_words_list = stopwords.words('english')

NLTK library defines 179 stop words. Some of the examples are:

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're"]

Domain-Specific Stop Words

Words deemed irrelevant in a specific domain.


When to Remove Stop Words

  • Improve computational efficiency
  • Reduce noise in the data
  • Remove when only the general sense of the corpus is needed

References: