Python: stopwords Indonesia

From OnnoWiki
Revision as of 05:24, 2 February 2017 by Onnowpurbo (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Cara lain adalah mengumpulkan sendiri stopwords

Jalankan script

cari-stopwords.py -i input.txt > kumpulan-words.txt

Edit

vi kumpulan-words.txt

buang kata-kata yang bukan stopwords kata-kata yang penting yang akan menjadi node di graph

simpan stopwords

rename

cp kumpulan-words.txt indonesia-sementara

merging dengan berbagai stopwords yang ada

rm ~/nltk_data/corpora/stopwords/indonesia
touch ~/nltk_data/corpora/stopwords/indonesia
cp indonesia-id1 ~/nltk_data/corpora/stopwords/indonesia
cat indonesia-angka >> ~/nltk_data/corpora/stopwords/indonesia
cat indonesia-jam >> ~/nltk_data/corpora/stopwords/indonesia
cat indonesia-merek >> ~/nltk_data/corpora/stopwords/indonesia
cat indonesia-politik >> ~/nltk_data/corpora/stopwords/indonesia
cat indonesia-stemmped >> ~/nltk_data/corpora/stopwords/indonesia
cat indonesia-sementara >> ~/nltk_data/corpora/stopwords/indonesia