Difference between revisions of "Python: NLTK download corpus"
Jump to navigation
Jump to search
Onnowpurbo (talk | contribs) |
Onnowpurbo (talk | contribs) |
||
(One intermediate revision by the same user not shown) | |||
Line 46: | Line 46: | ||
all | all | ||
− | supaya tidak pusing, tapi ini akan memakan banyak bandwidth | + | supaya tidak pusing, tapi ini akan memakan banyak bandwidth, |
+ | akan keluar | ||
+ | Downloading collection u'all' | ||
+ | | | ||
+ | | Downloading package abc to /home/onno/nltk_data... | ||
+ | | Package abc is already up-to-date! | ||
+ | | Downloading package alpino to /home/onno/nltk_data... | ||
+ | | Package alpino is already up-to-date! | ||
+ | | Downloading package biocreative_ppi to | ||
+ | | /home/onno/nltk_data... | ||
+ | | Package biocreative_ppi is already up-to-date! | ||
+ | ... | ||
+ | ... | ||
+ | dst ... | ||
− | + | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | Corpus NLTK aKan tersimpan di | ||
~/nltk_data/ | ~/nltk_data/ | ||
Lumayan besar .. | Lumayan besar .. |
Latest revision as of 15:34, 5 February 2017
Corpus untuk NLTK bisa di download menggunakan script, misalnya download-corpus.py
import nltk nltk.download()
jalankan
python download-corpus.py
akan keluar
NLTK Downloader --------------------------------------------------------------------------- d) Download l) List u) Update c) Config h) Help q) Quit ---------------------------------------------------------------------------
Pilih d untuk mendownload semua corpus yang ada supaya tidak pusing kepala, akan keluar,
Packages: [ ] averaged_perceptron_tagger_ru Averaged Perceptron Tagger (Russian) [ ] mwa_ppdb............ The monolingual word aligner (Sultan et al. 2015) subset of the Paraphrase Database. [ ] nonbreaking_prefixes Non-Breaking Prefixes (Moses Decoder) [-] panlex_lite......... PanLex Lite Corpus [ ] pe08................ Cross-Framework and Cross-Domain Parser Evaluation Shared Task [-] perluniprops........ perluniprops: Index of Unicode Version 7.0.0 character properties in Perl [ ] porter_test......... Porter Stemmer Test Files [-] stopwords........... Stopwords Corpus [ ] vader_lexicon....... VADER Sentiment Lexicon [ ] wmt15_eval.......... Evaluation data from WMT15 Collections: [-] all-corpora......... All the corpora [-] all................. All packages [-] book................ Everything used in the NLTK Book ([*] marks installed packages; [-] marks out-of-date or corrupt packages) Download which package (l=list; x=cancel)? Identifier>
Pilih
all
supaya tidak pusing, tapi ini akan memakan banyak bandwidth, akan keluar
Downloading collection u'all' | | Downloading package abc to /home/onno/nltk_data... | Package abc is already up-to-date! | Downloading package alpino to /home/onno/nltk_data... | Package alpino is already up-to-date! | Downloading package biocreative_ppi to | /home/onno/nltk_data... | Package biocreative_ppi is already up-to-date! ... ... dst ...
Corpus NLTK aKan tersimpan di
~/nltk_data/
Lumayan besar ..