NLTK: Stemming
Jump to navigation
Jump to search
A word stem is part of a word. It is sort of a normalization idea, but linguistic. For example, the stem of the word waiting is wait. word-stem word stem
Given words, NLTK can find the stems.
NLTK – stemming Start by defining some words:
words = ["game","gaming","gamed","games"]
We import the module:
from nltk.stem import PorterStemmer from nltk.tokenize import sent_tokenize, word_tokenize
And stem the words in the list using:
from nltk.stem import PorterStemmer from nltk.tokenize import sent_tokenize, word_tokenize words = ["game","gaming","gamed","games"] ps = PorterStemmer() for word in words: print(ps.stem(word))
nltk-stemming nltk word stem example
You can do word stemming for sentences too:
from nltk.stem import PorterStemmer from nltk.tokenize import sent_tokenize, word_tokenize ps = PorterStemmer() sentence = "gaming, the gamers play games" words = word_tokenize(sentence) for word in words: print(word + ":" + ps.stem(word))
python-nltk Stemming with NLTK
There are more stemming algorithms, but Porter (PorterStemer) is the most popular.