Difference between revisions of "Python: stopwords Indonesia"
Jump to navigation
Jump to search
Onnowpurbo (talk | contribs) |
Onnowpurbo (talk | contribs) |
||
Line 15: | Line 15: | ||
buang kata-kata yang bukan stopwords | buang kata-kata yang bukan stopwords | ||
kata-kata yang penting yang akan menjadi node di graph | kata-kata yang penting yang akan menjadi node di graph | ||
+ | |||
+ | ==simpan stopwords== | ||
+ | |||
+ | rename | ||
+ | |||
+ | cp kumpulan-words.txt indonesia-sementara | ||
+ | |||
+ | merging dengan berbagai stopwords yang ada | ||
+ | |||
+ | rm ~/nltk_data/corpora/stopwords/indonesia | ||
+ | touch ~/nltk_data/corpora/stopwords/indonesia | ||
+ | cp indonesia-id1 ~/nltk_data/corpora/stopwords/indonesia | ||
+ | cat indonesia-angka >> ~/nltk_data/corpora/stopwords/indonesia | ||
+ | cat indonesia-jam >> ~/nltk_data/corpora/stopwords/indonesia | ||
+ | cat indonesia-merek >> ~/nltk_data/corpora/stopwords/indonesia | ||
+ | cat indonesia-politik >> ~/nltk_data/corpora/stopwords/indonesia | ||
+ | cat indonesia-stemmped >> ~/nltk_data/corpora/stopwords/indonesia | ||
+ | cat indonesia-sementara >> ~/nltk_data/corpora/stopwords/indonesia |
Latest revision as of 05:24, 2 February 2017
Cara lain adalah mengumpulkan sendiri stopwords
Jalankan script
cari-stopwords.py -i input.txt > kumpulan-words.txt
Edit
vi kumpulan-words.txt
buang kata-kata yang bukan stopwords kata-kata yang penting yang akan menjadi node di graph
simpan stopwords
rename
cp kumpulan-words.txt indonesia-sementara
merging dengan berbagai stopwords yang ada
rm ~/nltk_data/corpora/stopwords/indonesia touch ~/nltk_data/corpora/stopwords/indonesia cp indonesia-id1 ~/nltk_data/corpora/stopwords/indonesia cat indonesia-angka >> ~/nltk_data/corpora/stopwords/indonesia cat indonesia-jam >> ~/nltk_data/corpora/stopwords/indonesia cat indonesia-merek >> ~/nltk_data/corpora/stopwords/indonesia cat indonesia-politik >> ~/nltk_data/corpora/stopwords/indonesia cat indonesia-stemmped >> ~/nltk_data/corpora/stopwords/indonesia cat indonesia-sementara >> ~/nltk_data/corpora/stopwords/indonesia