Difference between revisions of "Python: NLTK download corpus"
Jump to navigation
Jump to search
Onnowpurbo (talk | contribs) |
Onnowpurbo (talk | contribs) |
||
| (2 intermediate revisions by the same user not shown) | |||
| Line 15: | Line 15: | ||
--------------------------------------------------------------------------- | --------------------------------------------------------------------------- | ||
| − | Pilih d untuk mendownload semua corpus yang ada supaya tidak pusing kepala | + | Pilih d untuk mendownload semua corpus yang ada supaya tidak pusing kepala, akan keluar, |
| + | Packages: | ||
| + | [ ] averaged_perceptron_tagger_ru Averaged Perceptron Tagger (Russian) | ||
| + | [ ] mwa_ppdb............ The monolingual word aligner (Sultan et al. | ||
| + | 2015) subset of the Paraphrase Database. | ||
| + | [ ] nonbreaking_prefixes Non-Breaking Prefixes (Moses Decoder) | ||
| + | [-] panlex_lite......... PanLex Lite Corpus | ||
| + | [ ] pe08................ Cross-Framework and Cross-Domain Parser | ||
| + | Evaluation Shared Task | ||
| + | [-] perluniprops........ perluniprops: Index of Unicode Version 7.0.0 | ||
| + | character properties in Perl | ||
| + | [ ] porter_test......... Porter Stemmer Test Files | ||
| + | [-] stopwords........... Stopwords Corpus | ||
| + | [ ] vader_lexicon....... VADER Sentiment Lexicon | ||
| + | [ ] wmt15_eval.......... Evaluation data from WMT15 | ||
| + | |||
| + | Collections: | ||
| + | [-] all-corpora......... All the corpora | ||
| + | [-] all................. All packages | ||
| + | [-] book................ Everything used in the NLTK Book | ||
| + | |||
| + | ([*] marks installed packages; [-] marks out-of-date or corrupt packages) | ||
| + | |||
| + | Download which package (l=list; x=cancel)? | ||
| + | Identifier> | ||
| − | + | Pilih | |
| + | |||
| + | all | ||
| + | |||
| + | supaya tidak pusing, tapi ini akan memakan banyak bandwidth, | ||
| + | akan keluar | ||
| + | |||
| + | Downloading collection u'all' | ||
| + | | | ||
| + | | Downloading package abc to /home/onno/nltk_data... | ||
| + | | Package abc is already up-to-date! | ||
| + | | Downloading package alpino to /home/onno/nltk_data... | ||
| + | | Package alpino is already up-to-date! | ||
| + | | Downloading package biocreative_ppi to | ||
| + | | /home/onno/nltk_data... | ||
| + | | Package biocreative_ppi is already up-to-date! | ||
| + | ... | ||
| + | ... | ||
| + | dst ... | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | Corpus NLTK aKan tersimpan di | ||
~/nltk_data/ | ~/nltk_data/ | ||
Lumayan besar .. | Lumayan besar .. | ||
Latest revision as of 15:34, 5 February 2017
Corpus untuk NLTK bisa di download menggunakan script, misalnya download-corpus.py
import nltk nltk.download()
jalankan
python download-corpus.py
akan keluar
NLTK Downloader
---------------------------------------------------------------------------
d) Download l) List u) Update c) Config h) Help q) Quit
---------------------------------------------------------------------------
Pilih d untuk mendownload semua corpus yang ada supaya tidak pusing kepala, akan keluar,
Packages:
[ ] averaged_perceptron_tagger_ru Averaged Perceptron Tagger (Russian)
[ ] mwa_ppdb............ The monolingual word aligner (Sultan et al.
2015) subset of the Paraphrase Database.
[ ] nonbreaking_prefixes Non-Breaking Prefixes (Moses Decoder)
[-] panlex_lite......... PanLex Lite Corpus
[ ] pe08................ Cross-Framework and Cross-Domain Parser
Evaluation Shared Task
[-] perluniprops........ perluniprops: Index of Unicode Version 7.0.0
character properties in Perl
[ ] porter_test......... Porter Stemmer Test Files
[-] stopwords........... Stopwords Corpus
[ ] vader_lexicon....... VADER Sentiment Lexicon
[ ] wmt15_eval.......... Evaluation data from WMT15
Collections:
[-] all-corpora......... All the corpora
[-] all................. All packages
[-] book................ Everything used in the NLTK Book
([*] marks installed packages; [-] marks out-of-date or corrupt packages)
Download which package (l=list; x=cancel)?
Identifier>
Pilih
all
supaya tidak pusing, tapi ini akan memakan banyak bandwidth, akan keluar
Downloading collection u'all'
|
| Downloading package abc to /home/onno/nltk_data...
| Package abc is already up-to-date!
| Downloading package alpino to /home/onno/nltk_data...
| Package alpino is already up-to-date!
| Downloading package biocreative_ppi to
| /home/onno/nltk_data...
| Package biocreative_ppi is already up-to-date!
...
...
dst ...
Corpus NLTK aKan tersimpan di
~/nltk_data/
Lumayan besar ..