Difference between revisions of "Python: NLTK download corpus"
Jump to navigation
Jump to search
Onnowpurbo (talk | contribs) |
Onnowpurbo (talk | contribs) |
||
Line 15: | Line 15: | ||
--------------------------------------------------------------------------- | --------------------------------------------------------------------------- | ||
− | Pilih d untuk mendownload semua corpus yang ada supaya tidak pusing kepala | + | Pilih d untuk mendownload semua corpus yang ada supaya tidak pusing kepala, akan keluar, |
+ | |||
+ | Packages: | ||
+ | [ ] averaged_perceptron_tagger_ru Averaged Perceptron Tagger (Russian) | ||
+ | [ ] mwa_ppdb............ The monolingual word aligner (Sultan et al. | ||
+ | 2015) subset of the Paraphrase Database. | ||
+ | [ ] nonbreaking_prefixes Non-Breaking Prefixes (Moses Decoder) | ||
+ | [-] panlex_lite......... PanLex Lite Corpus | ||
+ | [ ] pe08................ Cross-Framework and Cross-Domain Parser | ||
+ | Evaluation Shared Task | ||
+ | [-] perluniprops........ perluniprops: Index of Unicode Version 7.0.0 | ||
+ | character properties in Perl | ||
+ | [ ] porter_test......... Porter Stemmer Test Files | ||
+ | [-] stopwords........... Stopwords Corpus | ||
+ | [ ] vader_lexicon....... VADER Sentiment Lexicon | ||
+ | [ ] wmt15_eval.......... Evaluation data from WMT15 | ||
+ | |||
+ | Collections: | ||
+ | [-] all-corpora......... All the corpora | ||
+ | [-] all................. All packages | ||
+ | [-] book................ Everything used in the NLTK Book | ||
+ | |||
+ | ([*] marks installed packages; [-] marks out-of-date or corrupt packages) | ||
+ | |||
+ | Download which package (l=list; x=cancel)? | ||
+ | Identifier> | ||
+ | |||
+ | Pilih | ||
+ | |||
+ | all | ||
+ | |||
+ | supaya tidak pusing, tapi ini akan memakan banyak bandwidth | ||
Revision as of 05:17, 2 February 2017
Corpus untuk NLTK bisa di download menggunakan script, misalnya download-corpus.py
import nltk nltk.download()
jalankan
python download-corpus.py
akan keluar
NLTK Downloader --------------------------------------------------------------------------- d) Download l) List u) Update c) Config h) Help q) Quit ---------------------------------------------------------------------------
Pilih d untuk mendownload semua corpus yang ada supaya tidak pusing kepala, akan keluar,
Packages: [ ] averaged_perceptron_tagger_ru Averaged Perceptron Tagger (Russian) [ ] mwa_ppdb............ The monolingual word aligner (Sultan et al. 2015) subset of the Paraphrase Database. [ ] nonbreaking_prefixes Non-Breaking Prefixes (Moses Decoder) [-] panlex_lite......... PanLex Lite Corpus [ ] pe08................ Cross-Framework and Cross-Domain Parser Evaluation Shared Task [-] perluniprops........ perluniprops: Index of Unicode Version 7.0.0 character properties in Perl [ ] porter_test......... Porter Stemmer Test Files [-] stopwords........... Stopwords Corpus [ ] vader_lexicon....... VADER Sentiment Lexicon [ ] wmt15_eval.......... Evaluation data from WMT15 Collections: [-] all-corpora......... All the corpora [-] all................. All packages [-] book................ Everything used in the NLTK Book ([*] marks installed packages; [-] marks out-of-date or corrupt packages) Download which package (l=list; x=cancel)? Identifier>
Pilih
all
supaya tidak pusing, tapi ini akan memakan banyak bandwidth
AKan tersimpan di
~/nltk_data/
Lumayan besar ..