Orange: Hierarchical Clustering Example
Sumber: https://orange.biolab.si/workflows/
Workflow meng-cluster item data dalam dataset iris dengan terlebih dahulu memeriksa Distances antara instance data menggunakan widget Distances. Distance Matrix diteruskan ke widget Hierarchical Clustering, yang menjadikan dendrogram. Kita dapat memilih bagian dendrogram yang berbeda untuk menganalisis lebih lanjut data terkait.
The workflow in Orange Data Mining shown in the figure follows a Hierarchical Clustering approach, incorporating data visualization and interactive analysis. Here’s the step-by-step breakdown:
1. File (Data Loading)
- The process starts by reading the dataset using the File widget.
- The dataset used in this example is the "brown-selected" data, which is available in Orange’s default datasets.
2. Distances (Computing Pairwise Similarities)
- The Distances widget computes the similarity (or dissimilarity) between data samples.
- It calculates the distances based on numerical attributes.
3. Hierarchical Clustering (Grouping Data)
- The Hierarchical Clustering widget groups the data into clusters.
- This step helps in identifying patterns and similarities among the dataset samples.
4. Distance Map (Visualizing Distances)
- The Distance Map widget is used to visualize data distances in a heat map.
- This provides an overview of how similar or different the data samples are.
5. Data Table (Cluster Exploration)
- The Data Table widget allows users to explore the data interactively.
- Users can select parts of the clustering dendrogram to inspect the data points in different clusters.
6. Box Plot (Cluster Analysis)
- The Box Plot widget enables the comparison of data distributions within clusters.
- It helps in analyzing how different variables are distributed across the clusters.
7. Interactive Selection and Propagation
- Any changes in cluster selection from the Hierarchical Clustering widget are propagated to the Data Table and Box Plot widgets.
- This makes the workflow interactive, allowing users to analyze data clusters dynamically.
Summary
This Orange workflow loads a dataset, computes distances between data points, applies hierarchical clustering, visualizes similarities using a heatmap, and enables interactive exploration through data tables and box plots. It is particularly useful for identifying patterns and relationships within the dataset through hierarchical clustering and interactive analysis.