Difference between revisions of "Orange: Word Cloud dari File Text"

From OnnoWiki
Jump to navigation Jump to search
Line 3: Line 3:
  
 
[[File:ORANGE-word-cloud.png|center|400px|thumb]]
 
[[File:ORANGE-word-cloud.png|center|400px|thumb]]
 +
 +
The workflow in '''Orange Data Mining''' shown in the image follows a '''text processing and visualization''' approach using a '''word cloud'''. Here’s the step-by-step breakdown:
 +
 +
'''1. Text Files (Loading Data)'''
 +
* The '''Text Files''' widget is used to load text data from multiple files.
 +
* These files contain textual information that will be analyzed.
 +
 +
'''2. Segment (Splitting Text into Segments)'''
 +
* The '''Segment''' widget is used to '''split the text data into meaningful segments''' (e.g., sentences, paragraphs, or predefined sections).
 +
* This step helps in structuring the data for further processing.
 +
 +
'''3. Preprocess Text (Cleaning and Normalization)'''
 +
* The '''Preprocess Text''' widget processes the segmented text.
 +
* Common preprocessing steps include:
 +
** '''Tokenization''' (splitting text into words),
 +
** '''Removing stopwords''' (common words like "the", "is", etc.),
 +
** '''Stemming or lemmatization''' (reducing words to their base form).
 +
* This prepares the text data for analysis.
 +
 +
'''4. Word Cloud (Visualizing Key Words)'''
 +
* The '''Word Cloud''' widget generates a '''word cloud visualization'''.
 +
* The most frequently occurring words appear '''larger''', helping in identifying key terms and patterns in the dataset.
 +
 +
==Summary==
 +
 +
This '''Orange Data Mining''' workflow loads '''text files''', '''segments''' the data, '''preprocesses''' the text for better readability, and '''visualizes the most frequent words using a word cloud'''. It is useful for '''text mining, exploratory text analysis, and keyword extraction'''.
 +
 +
  
  

Revision as of 13:47, 14 February 2025

Word Cloud data dapat di bangun dari file text (ASCII) yang kita miliki seperti pada workflow di bawah ini. Pertama-tama data dari Widget Text Files harus di segmented menjadi word menggunakan Widget Segment. Kemudian output segmented data perlu di konversikan dari segmented data menjadi corpus agar bisa di proses oleh toolbox text mining menggunakan Widget Interchange. Sebelum di tampilkan sebagai word cloud ada baiknya dilakukan preprocessing terlebih dulu, untuk mengurangi berbagai kata yang tidak dibutuhkan, seperti kata penghubungi dll menggunakan Widget Preprocess Text.


ORANGE-word-cloud.png

The workflow in Orange Data Mining shown in the image follows a text processing and visualization approach using a word cloud. Here’s the step-by-step breakdown:

1. Text Files (Loading Data)

  • The Text Files widget is used to load text data from multiple files.
  • These files contain textual information that will be analyzed.

2. Segment (Splitting Text into Segments)

  • The Segment widget is used to split the text data into meaningful segments (e.g., sentences, paragraphs, or predefined sections).
  • This step helps in structuring the data for further processing.

3. Preprocess Text (Cleaning and Normalization)

  • The Preprocess Text widget processes the segmented text.
  • Common preprocessing steps include:
    • Tokenization (splitting text into words),
    • Removing stopwords (common words like "the", "is", etc.),
    • Stemming or lemmatization (reducing words to their base form).
  • This prepares the text data for analysis.

4. Word Cloud (Visualizing Key Words)

  • The Word Cloud widget generates a word cloud visualization.
  • The most frequently occurring words appear larger, helping in identifying key terms and patterns in the dataset.

Summary

This Orange Data Mining workflow loads text files, segments the data, preprocesses the text for better readability, and visualizes the most frequent words using a word cloud. It is useful for text mining, exploratory text analysis, and keyword extraction.



Screenshot from 2020-02-23 13-14-45.png


Screenshot from 2020-02-23 13-20-23.png

Pada Widget Preprocess Text kita dapat melakukan beberapa hal, seperti

  • Mengubah agar semua huruf menjadi huruf kecil.
  • Menghilangkan (stop word), kata-kata yang kurang bermanfaat seperti, kata penghubung seperti dan, di, ke, dari dll.
  • Mengatur agar pemrosesan stopword dalam bahasa Indonesia.
  • Menghilangkan tag HTML
  • Menghilangkan URL
  • dll.



Pranala Menarik