Difference between revisions of "Orange: Corpus Viewer"

From OnnoWiki
Jump to navigation Jump to search
Line 14: Line 14:
  
 
[[File:Corpus-Viewer-stamped.png|center|200px|thumb]]
 
[[File:Corpus-Viewer-stamped.png|center|200px|thumb]]
 +
 +
 +
* Information:
 +
** Documents: number of documents on the input
 +
** Preprocessed: if preprocessor is used, the result is True, else False. Reports also on the number of tokens and types (unique tokens).
 +
** POS tagged: if POS tags are on the input, the result is True, else False.
 +
** N-grams range: if N-grams are set in Preprocess Text, results are reported, default is 1-1 (one-grams).
 +
** Matching: number of documents matching the RegExp Filter. All documents are output by default.
 +
* RegExp Filter: Python regular expression for filtering documents. By default no documents are filtered (entire corpus is on the output).
 +
* Search Features: features by which the RegExp Filter is filtering. Use Ctrl (Cmd) to select multiple features.
 +
* Display Features: features that are displayed in the viewer. Use Ctrl (Cmd) to select multiple features.
 +
* Show Tokens & Tags: if tokens and POS tag are present on the input, you can check this box to display them.
 +
* If Auto commit is on, changes are communicated automatically. Alternatively press Commit.
  
 
==Referensi==
 
==Referensi==

Revision as of 09:44, 7 January 2020

Sumber: https://orange3-text.readthedocs.io/en/latest/widgets/corpusviewer.html

Menayangkan isi corpus

Input

Corpus: kumpulan dokumen.

Output

Corpus: Dokumen yang berisi word yang di minta.

Corpus Viewer dimaksudkan untuk melihat file teks (contoh corpus). Dia akan selalu menampilkan keluaran corpus. Jika Regexp filtering digunakan, widget hanya akan menampilkan dokumen yang cocok saja.

Corpus-Viewer-stamped.png


  • Information:
    • Documents: number of documents on the input
    • Preprocessed: if preprocessor is used, the result is True, else False. Reports also on the number of tokens and types (unique tokens).
    • POS tagged: if POS tags are on the input, the result is True, else False.
    • N-grams range: if N-grams are set in Preprocess Text, results are reported, default is 1-1 (one-grams).
    • Matching: number of documents matching the RegExp Filter. All documents are output by default.
  • RegExp Filter: Python regular expression for filtering documents. By default no documents are filtered (entire corpus is on the output).
  • Search Features: features by which the RegExp Filter is filtering. Use Ctrl (Cmd) to select multiple features.
  • Display Features: features that are displayed in the viewer. Use Ctrl (Cmd) to select multiple features.
  • Show Tokens & Tags: if tokens and POS tag are present on the input, you can check this box to display them.
  • If Auto commit is on, changes are communicated automatically. Alternatively press Commit.

Referensi

Pranala Menarik