Orange: Word Enrichment

From OnnoWiki
Jump to navigation Jump to search

Sumber: https://orange3-text.readthedocs.io/en/latest/widgets/wordenrichment.html


Widget Word Enrichment melakukan analisis word enrichment pada dokumen yang dipilih.

Input

Corpus: A collection of documents.
Selected Data: Selected instances from corpus.

Output

None

Widget Word Enrichment menampilkan daftar kata dengan p-value yang lebih rendah (signifikansi lebih tinggi) untuk subset yang dipilih dibandingkan dengan seluruh corpus. p-value yang lebih rendah menunjukkan kemungkinan yang lebih tinggi bahwa kata tersebut signifikan untuk subset yang dipilih (tidak terjadi secara acak dalam text). FDR (False Discovery Rate) dikaitkan dengan p-value dan melaporkan pada perkiraan yang diharapkan dari prediksi salah dalam rangkaian prediksi, yang berarti itu menghitung positif palsu dalam daftar p-value rendah.

Word-Enrichment-stamped.png
  • Information on the input.
    • Cluster words are all the tokens from the corpus.
    • Selected words are all the tokens from the selected subset.
    • After filtering reports on the enriched words found in the subset.
  • Filter enables you to filter by:
    • p-value
    • false discovery rate (FDR)

Contoh

In the example below, we’re retrieved recent tweets from the 2016 presidential candidates, Donald Trump and Hillary Clinton. Then we’ve preprocessed the tweets to get only words as tokens and to remove the stopwords. We’ve connected the preprocessed corpus to widget Bag of Words to get a table with word counts for our corpus.

Word-Enrichment-Example.png

Then we’ve connected widget Corpus Viewer to widget Bag of Words and selected only those tweets that were published by Donald Trump. See how we marked only the Author as our Search feature to retrieve those tweets.

Word Enrichment accepts two inputs - the entire corpus to serve as a reference and a selected subset from the corpus to do the enrichment on. First connect widget Corpus Viewer to widget Word Enrichment (input Matching Docs → Selected Data) and then connect widget Bag of Words to it (input Corpus → Data). In the widget Word Enrichment widget we can see the list of words that are more significant for Donald Trump than they are for Hillary Clinton.

Referensi

Pranala Menarik