Difference between revisions of "Python: NLTK download corpus"

From OnnoWiki
Jump to navigation Jump to search
(Created page with "Corpus untuk NLTK bisa di download menggunakan script import nltk nltk.download() AKan tersimpan di ~/nltk_data/ Lumayan besar ..")
 
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
Corpus untuk NLTK bisa di download menggunakan script
+
Corpus untuk NLTK bisa di download menggunakan script, misalnya download-corpus.py
  
 
  import nltk
 
  import nltk
 
  nltk.download()
 
  nltk.download()
  
AKan tersimpan di
+
jalankan
 +
 
 +
python download-corpus.py
 +
 
 +
akan keluar
 +
 
 +
NLTK Downloader
 +
---------------------------------------------------------------------------
 +
    d) Download  l) List    u) Update  c) Config  h) Help  q) Quit
 +
---------------------------------------------------------------------------
 +
 
 +
Pilih d untuk mendownload semua corpus yang ada supaya tidak pusing kepala, akan keluar,
 +
 
 +
Packages:
 +
  [ ] averaged_perceptron_tagger_ru Averaged Perceptron Tagger (Russian)
 +
  [ ] mwa_ppdb............ The monolingual word aligner (Sultan et al.
 +
                            2015) subset of the Paraphrase Database.
 +
  [ ] nonbreaking_prefixes Non-Breaking Prefixes (Moses Decoder)
 +
  [-] panlex_lite......... PanLex Lite Corpus
 +
  [ ] pe08................ Cross-Framework and Cross-Domain Parser
 +
                            Evaluation Shared Task
 +
  [-] perluniprops........ perluniprops: Index of Unicode Version 7.0.0
 +
                            character properties in Perl
 +
  [ ] porter_test......... Porter Stemmer Test Files
 +
  [-] stopwords........... Stopwords Corpus
 +
  [ ] vader_lexicon....... VADER Sentiment Lexicon
 +
  [ ] wmt15_eval.......... Evaluation data from WMT15
 +
 +
Collections:
 +
  [-] all-corpora......... All the corpora
 +
  [-] all................. All packages
 +
  [-] book................ Everything used in the NLTK Book
 +
 +
([*] marks installed packages; [-] marks out-of-date or corrupt packages)
 +
 +
Download which package (l=list; x=cancel)?
 +
  Identifier>
 +
 
 +
Pilih
 +
 
 +
all
 +
 
 +
supaya tidak pusing, tapi ini akan memakan banyak bandwidth,
 +
akan keluar
 +
 
 +
    Downloading collection u'all'
 +
      |
 +
      | Downloading package abc to /home/onno/nltk_data...
 +
      |  Package abc is already up-to-date!
 +
      | Downloading package alpino to /home/onno/nltk_data...
 +
      |  Package alpino is already up-to-date!
 +
      | Downloading package biocreative_ppi to
 +
      |    /home/onno/nltk_data...
 +
      |  Package biocreative_ppi is already up-to-date!
 +
...
 +
...
 +
dst ...
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
Corpus NLTK aKan tersimpan di
  
 
  ~/nltk_data/
 
  ~/nltk_data/
  
 
Lumayan besar ..
 
Lumayan besar ..

Latest revision as of 15:34, 5 February 2017

Corpus untuk NLTK bisa di download menggunakan script, misalnya download-corpus.py

import nltk
nltk.download()

jalankan

python download-corpus.py

akan keluar

NLTK Downloader
---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------

Pilih d untuk mendownload semua corpus yang ada supaya tidak pusing kepala, akan keluar,

Packages:
  [ ] averaged_perceptron_tagger_ru Averaged Perceptron Tagger (Russian)
  [ ] mwa_ppdb............ The monolingual word aligner (Sultan et al.
                           2015) subset of the Paraphrase Database.
  [ ] nonbreaking_prefixes Non-Breaking Prefixes (Moses Decoder)
  [-] panlex_lite......... PanLex Lite Corpus
  [ ] pe08................ Cross-Framework and Cross-Domain Parser
                           Evaluation Shared Task
  [-] perluniprops........ perluniprops: Index of Unicode Version 7.0.0
                           character properties in Perl
  [ ] porter_test......... Porter Stemmer Test Files
  [-] stopwords........... Stopwords Corpus
  [ ] vader_lexicon....... VADER Sentiment Lexicon
  [ ] wmt15_eval.......... Evaluation data from WMT15

Collections:
  [-] all-corpora......... All the corpora
  [-] all................. All packages
  [-] book................ Everything used in the NLTK Book

([*] marks installed packages; [-] marks out-of-date or corrupt packages)

Download which package (l=list; x=cancel)?
  Identifier>

Pilih

all

supaya tidak pusing, tapi ini akan memakan banyak bandwidth, akan keluar

   Downloading collection u'all'
      | 
      | Downloading package abc to /home/onno/nltk_data...
      |   Package abc is already up-to-date!
      | Downloading package alpino to /home/onno/nltk_data...
      |   Package alpino is already up-to-date!
      | Downloading package biocreative_ppi to
      |     /home/onno/nltk_data...
      |   Package biocreative_ppi is already up-to-date!
...
...
dst ...




Corpus NLTK aKan tersimpan di

~/nltk_data/

Lumayan besar ..