WordNet
Lexical Semantics
One way to allow computers to understand the semantics of words is to use a thesaurus, and WordNet is one such thesaurus.
WordNet approach is used to represent the lexical semantics (how the meaning of words is structured, used, and understood) of natural language.
It is important to know the difference between lexical semantics and distributional semantics/representation.
Installation
You can download WordNet from the nltk
library.
conda install nltk
Once library is installed, download the WordNet data:
import nltk
nltk.download('wordnet')
Download Path
By default, data will be downloaded to /Users/${username}/nltk_data
.
You can download the data to a specific path by:
nltk.download('wordnet', download_dir='/path/to/download')
By default, the library searches for data in the following paths:
- '/Users/${username}/nltk_data'
- '/opt/homebrew/Caskroom/miniforge/base/envs/nlp/nltk_data'
- '/opt/homebrew/Caskroom/miniforge/base/envs/nlp/share/nltk_data'
- '/opt/homebrew/Caskroom/miniforge/base/envs/nlp/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
Import WordNet:
from nltk.corpus import wordnet as wn
Synsets
WordNet groups words into sets of synonyms called synsets.
There are multiple synsets for each word:
wn.synsets('dog')
>> [Synset('dog.n.01'),
Synset('frump.n.01'),
Synset('dog.n.03'),
Synset('cad.n.01'),
Synset('frank.n.02'),
Synset('pawl.n.01'),
Synset('andiron.n.01'),
Synset('chase.v.01')]
You can also query a specific part of speech (POS):
wn.synsets('dog', pos=wn.VERB)
>> [Synset('chase.v.01')]
Choose a synset for a word:
dog = wn.synset('dog.n.01')
Hypernyms
Each synset has hypernyms (a more general word/parent):
dog.hypernyms()
>> [Synset('canine.n.02'), Synset('domestic_animal.n.01')]
Hyponyms
Each synset has hyponyms (more specific words/children):
dog.hyponyms()
>> [...,
Synset('corgi.n.01'),
...]
Shortcomings of Thesaurus
- Hard to maintain/keep up-to-date
- Managed by humans, which is costly
- Hard to capture nuances of language
- Subjective
References: