The POS tagging process is the process of finding the sequence of tags that is most likely to have generated a given word sequence. We can model this POS process by using a Hidden Markov Model (HMM), where tags are the hidden states that produced the observable output, i.e., the words.
Part of Speech (hereby referred to as POS) Tags are useful for building parse trees, which are used in building NERs (most named entities are Nouns) and extracting relations between words. POS Tagging is also essential for building lemmatizers which are used to reduce a word to its root form.
We note the following types of POS taggers:
- Rule-Based: A dictionary is constructed with possible tags for each word.
- Statistical: A text corpus is used to derive useful probabilities.
- Memory-Based: A set of cases is stored in memory, each case containing a word, its context and suitable tag.
Example:
# Loading Libraries
from nltk.tag import DefaultTagger
# Defining Tag
tagging = DefaultTagger('NN')
# Tagging
tagging.tag(['Hello', 'World'])
Output:
[('Hello', 'NN'), ('World', 'NN')]