Difference between revisions of "Python: NLTK download corpus"
Jump to navigation
Jump to search
Onnowpurbo (talk | contribs) (Created page with "Corpus untuk NLTK bisa di download menggunakan script import nltk nltk.download() AKan tersimpan di ~/nltk_data/ Lumayan besar ..") |
Onnowpurbo (talk | contribs) |
||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | Corpus untuk NLTK bisa di download menggunakan script | + | Corpus untuk NLTK bisa di download menggunakan script, misalnya download-corpus.py |
import nltk | import nltk | ||
nltk.download() | nltk.download() | ||
− | + | jalankan | |
+ | |||
+ | python download-corpus.py | ||
+ | |||
+ | akan keluar | ||
+ | |||
+ | NLTK Downloader | ||
+ | --------------------------------------------------------------------------- | ||
+ | d) Download l) List u) Update c) Config h) Help q) Quit | ||
+ | --------------------------------------------------------------------------- | ||
+ | |||
+ | Pilih d untuk mendownload semua corpus yang ada supaya tidak pusing kepala, akan keluar, | ||
+ | |||
+ | Packages: | ||
+ | [ ] averaged_perceptron_tagger_ru Averaged Perceptron Tagger (Russian) | ||
+ | [ ] mwa_ppdb............ The monolingual word aligner (Sultan et al. | ||
+ | 2015) subset of the Paraphrase Database. | ||
+ | [ ] nonbreaking_prefixes Non-Breaking Prefixes (Moses Decoder) | ||
+ | [-] panlex_lite......... PanLex Lite Corpus | ||
+ | [ ] pe08................ Cross-Framework and Cross-Domain Parser | ||
+ | Evaluation Shared Task | ||
+ | [-] perluniprops........ perluniprops: Index of Unicode Version 7.0.0 | ||
+ | character properties in Perl | ||
+ | [ ] porter_test......... Porter Stemmer Test Files | ||
+ | [-] stopwords........... Stopwords Corpus | ||
+ | [ ] vader_lexicon....... VADER Sentiment Lexicon | ||
+ | [ ] wmt15_eval.......... Evaluation data from WMT15 | ||
+ | |||
+ | Collections: | ||
+ | [-] all-corpora......... All the corpora | ||
+ | [-] all................. All packages | ||
+ | [-] book................ Everything used in the NLTK Book | ||
+ | |||
+ | ([*] marks installed packages; [-] marks out-of-date or corrupt packages) | ||
+ | |||
+ | Download which package (l=list; x=cancel)? | ||
+ | Identifier> | ||
+ | |||
+ | Pilih | ||
+ | |||
+ | all | ||
+ | |||
+ | supaya tidak pusing, tapi ini akan memakan banyak bandwidth, | ||
+ | akan keluar | ||
+ | |||
+ | Downloading collection u'all' | ||
+ | | | ||
+ | | Downloading package abc to /home/onno/nltk_data... | ||
+ | | Package abc is already up-to-date! | ||
+ | | Downloading package alpino to /home/onno/nltk_data... | ||
+ | | Package alpino is already up-to-date! | ||
+ | | Downloading package biocreative_ppi to | ||
+ | | /home/onno/nltk_data... | ||
+ | | Package biocreative_ppi is already up-to-date! | ||
+ | ... | ||
+ | ... | ||
+ | dst ... | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | Corpus NLTK aKan tersimpan di | ||
~/nltk_data/ | ~/nltk_data/ | ||
Lumayan besar .. | Lumayan besar .. |
Latest revision as of 15:34, 5 February 2017
Corpus untuk NLTK bisa di download menggunakan script, misalnya download-corpus.py
import nltk nltk.download()
jalankan
python download-corpus.py
akan keluar
NLTK Downloader --------------------------------------------------------------------------- d) Download l) List u) Update c) Config h) Help q) Quit ---------------------------------------------------------------------------
Pilih d untuk mendownload semua corpus yang ada supaya tidak pusing kepala, akan keluar,
Packages: [ ] averaged_perceptron_tagger_ru Averaged Perceptron Tagger (Russian) [ ] mwa_ppdb............ The monolingual word aligner (Sultan et al. 2015) subset of the Paraphrase Database. [ ] nonbreaking_prefixes Non-Breaking Prefixes (Moses Decoder) [-] panlex_lite......... PanLex Lite Corpus [ ] pe08................ Cross-Framework and Cross-Domain Parser Evaluation Shared Task [-] perluniprops........ perluniprops: Index of Unicode Version 7.0.0 character properties in Perl [ ] porter_test......... Porter Stemmer Test Files [-] stopwords........... Stopwords Corpus [ ] vader_lexicon....... VADER Sentiment Lexicon [ ] wmt15_eval.......... Evaluation data from WMT15 Collections: [-] all-corpora......... All the corpora [-] all................. All packages [-] book................ Everything used in the NLTK Book ([*] marks installed packages; [-] marks out-of-date or corrupt packages) Download which package (l=list; x=cancel)? Identifier>
Pilih
all
supaya tidak pusing, tapi ini akan memakan banyak bandwidth, akan keluar
Downloading collection u'all' | | Downloading package abc to /home/onno/nltk_data... | Package abc is already up-to-date! | Downloading package alpino to /home/onno/nltk_data... | Package alpino is already up-to-date! | Downloading package biocreative_ppi to | /home/onno/nltk_data... | Package biocreative_ppi is already up-to-date! ... ... dst ...
Corpus NLTK aKan tersimpan di
~/nltk_data/
Lumayan besar ..