Difference between revisions of "Orange: k-Means"

From OnnoWiki
Jump to navigation Jump to search
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
Sumber: https://docs.biolab.si//3/visual-programming/widgets/unsupervised/kmeans.html
 
Sumber: https://docs.biolab.si//3/visual-programming/widgets/unsupervised/kmeans.html
  
Groups items using the k-Means clustering algorithm.
+
Widget k-Means mengelompokan item / data menggunakan algoritma k-Means clustering.
  
Inputs
+
==Input==
  
    Data: input dataset
+
Data: input dataset
  
Outputs
+
==Output==
  
    Data: dataset with cluster index as a class attribute
+
Data: dataset with cluster index as a class attribute
  
The widget applies the k-Means clustering algorithm to the data and outputs a new dataset in which the cluster index is used as a class attribute. The original class attribute, if it exists, is moved to meta attributes. Scores of clustering results for various k are also shown in the widget.
+
Widget k-Means menerapkan algoritma k-Means clustering ke data dan mengeluarkan dataset baru di mana indeks cluster digunakan sebagai atribut class. Atribut class yang original, jika ada, dipindahkan ke atribut meta. Skor hasil pengelompokan untuk berbagai k juga ditampilkan di widget k-Means.
  
[[File:KMeans-stamped.png|center|200px|thumb]]
+
[[File:KMeans-stamped.png|center|600px|thumb]]
  
    Select the number of clusters.
+
* Select the number of clusters.
        Fixed: algorithm clusters data in a specified number of clusters.
+
** Fixed: algorithm clusters data in a specified number of clusters.
        Optimized: widget shows clustering scores for the selected cluster range:
+
** Optimized: widget shows clustering scores for the selected cluster range:
            Silhouette (contrasts average distance to elements in the same cluster with the average distance to elements in other clusters)
+
*** Silhouette (contrasts average distance to elements in the same cluster with the average distance to elements in other clusters)
            Inter-cluster distance (measures distances between clusters,normally between centroids)
+
*** Inter-cluster distance (measures distances between clusters,normally between centroids)
            Distance to centroids (measures distances to the arithmetic means of clusters)
+
*** Distance to centroids (measures distances to the arithmetic means of clusters)
    Select the initialization method (the way the algorithm begins clustering):
+
* Select the initialization method (the way the algorithm begins clustering):
        k-Means++ (first center is selected randomly, subsequent are chosen from the remaining points with probability proportioned to squared distance from the closest center)
+
** k-Means++ (first center is selected randomly, subsequent are chosen from the remaining points with probability proportioned to squared distance from the closest center)
        Random initialization (clusters are assigned randomly at first and then updated with further iterations) Re-runs (how many times the algorithm is run from random initial positions; the result with the lowest within-cluster sum of squares will be used) and maximal iterations (the maximum number of iterations within each algorithm run) can be set manually.
+
** Random initialization (clusters are assigned randomly at first and then updated with further iterations) Re-runs (how many times the algorithm is run from random initial positions; the result with the lowest within-cluster sum of squares will be used) and maximal iterations (the maximum number of iterations within each algorithm run) can be set manually.
    The widget outputs a new dataset with appended cluster information. Select how to append cluster information (as class, feature or meta attribute) and name the column.
+
* The widget outputs a new dataset with appended cluster information. Select how to append cluster information (as class, feature or meta attribute) and name the column.
    If Apply Automatically is ticked, the widget will commit changes automatically. Alternatively, click Apply.
+
* If Apply Automatically is ticked, the widget will commit changes automatically. Alternatively, click Apply.
    Produce a report.
+
* Produce a report.
    Check scores of clustering results for various k.
+
* Check scores of clustering results for various k.
  
 
==Contoh==
 
==Contoh==
  
We are going to explore the widget with the following schema.
+
Kita akan mengexplorasi Widget k-Means melalui schema / workflow berikut.
  
[[File:K-MeansClustering-Schema.png.jpeg|center|200px|thumb]]
+
[[File:K-MeansClustering-Schema.png.jpeg|center|600px|thumb]]
  
First, we load the Iris dataset, divide it into three clusters and show it in the Data Table, where we can observe which instance went into which cluster. The interesting parts are the Scatter Plot and Select Rows.
+
Pertama, kita memuat dataset Iris, membaginya menjadi tiga cluster dan menampilkannya di widget Data Table, di mana kita bisa mengamati instance mana yang masuk ke cluster mana. Bagian yang menarik adalah Widget Scatter Plot dan Widget Select Rows.
  
Since k-Means added the cluster index as a class attribute, the scatter plot will color the points according to the clusters they are in.
+
Karena Widget k-Means menambahkan cluster index sebagai atribut class, Widget Scatter Plot akan mewarnai titik-titik sesuai dengan kelompoknya.
  
[[File:KMeans-Scatterplot.png|center|200px|thumb]]
+
[[File:KMeans-Scatterplot.png|center|600px|thumb]]
  
What we are really interested in is how well the clusters induced by the (unsupervised) clustering algorithm match the actual classes in the data. We thus take Select Rows widget, in which we can select individual classes and have the corresponding points marked in the scatter plot. The match is perfect for setosa, and pretty good for the other two classes.
+
Yang kita akan benar-benar tertarik adalah seberapa baik cluster yang dilakukan oleh (unsupervised) clustering algorithm cocok dengan class aktual dalam data. Oleh karena itu, kita menggunakan widget Select Rows, di mana kami dapat memilih class-class individual dan menandai titik-titik terkait di widget Scatter Plot. Kecocokan yang sempurna terjadi pada setosa, dan cukup bagus untuk dua class lainnya.
  
[[File:K-MeansClustering-Example.png|center|200px|thumb]]
+
[[File:K-MeansClustering-Example.png|center|600px|thumb]]
  
You may have noticed that we left the Remove unused values/attributes and Remove unused classes in Select Rows unchecked. This is important: if the widget modifies the attributes, it outputs a list of modified instances and the scatter plot cannot compare them to the original data.
+
Kita mungkin telah memperhatikan bahwa kita membiarkan Remove unused values/attributes dan Remove unused classes di widget Select Rows tidak dicentang. Ini penting: jika widget k-Means memodifikasi atribut, widget k-Means menampilkan daftar instance yang dimodifikasi dan widget Scatter Plot jadi tidak dapat membandingkannya dengan data original.
 
 
Perhaps a simpler way to test the match between clusters and the original classes is to use the Distributions widget.
 
 
 
[[File:K-MeansClustering-Example2.png|center|200px|thumb]]
 
 
 
The only (minor) problem here is that this widget only visualizes normal (and not meta) attributes. We solve this by using Select Columns: we reinstate the original class Iris as the class and put the cluster index among the attributes.
 
 
 
The match is perfect for setosa: all instances of setosa are in the third cluster (blue). 48 versicolors are in the second cluster (red), while two ended up in the first. For virginicae, 36 are in the first cluster and 14 in the second.
 
  
 +
Mungkin cara yang lebih sederhana untuk menguji kecocokan antara cluster dan class original adalah dengan menggunakan widget Distributions.
  
 +
[[File:K-MeansClustering-Example2.png|center|600px|thumb]]
  
 +
Satu-satunya masalah (kecil) di sini adalah bahwa widget Scatter Plot hanya memvisualisasikan atribut normal (dan bukan meta). Kita menyelesaikan masalah ini dengan menggunakan widget Select Columns: kita mengembalikan class Iris original sebagai class dan menempatkan cluster index sebagai atribut.
  
 +
Kecocokan yang sempurna untuk setosa: semua contoh setosa berada di cluster ketiga (biru). 48 versicolors berada di cluster kedua (merah), sedangkan dua di yang pertama. Untuk virginica, 36 berada di kelompok pertama dan 14 di kelompok kedua.
  
 
==Referensi==
 
==Referensi==

Latest revision as of 09:56, 14 April 2020

Sumber: https://docs.biolab.si//3/visual-programming/widgets/unsupervised/kmeans.html

Widget k-Means mengelompokan item / data menggunakan algoritma k-Means clustering.

Input

Data: input dataset

Output

Data: dataset with cluster index as a class attribute

Widget k-Means menerapkan algoritma k-Means clustering ke data dan mengeluarkan dataset baru di mana indeks cluster digunakan sebagai atribut class. Atribut class yang original, jika ada, dipindahkan ke atribut meta. Skor hasil pengelompokan untuk berbagai k juga ditampilkan di widget k-Means.

KMeans-stamped.png
  • Select the number of clusters.
    • Fixed: algorithm clusters data in a specified number of clusters.
    • Optimized: widget shows clustering scores for the selected cluster range:
      • Silhouette (contrasts average distance to elements in the same cluster with the average distance to elements in other clusters)
      • Inter-cluster distance (measures distances between clusters,normally between centroids)
      • Distance to centroids (measures distances to the arithmetic means of clusters)
  • Select the initialization method (the way the algorithm begins clustering):
    • k-Means++ (first center is selected randomly, subsequent are chosen from the remaining points with probability proportioned to squared distance from the closest center)
    • Random initialization (clusters are assigned randomly at first and then updated with further iterations) Re-runs (how many times the algorithm is run from random initial positions; the result with the lowest within-cluster sum of squares will be used) and maximal iterations (the maximum number of iterations within each algorithm run) can be set manually.
  • The widget outputs a new dataset with appended cluster information. Select how to append cluster information (as class, feature or meta attribute) and name the column.
  • If Apply Automatically is ticked, the widget will commit changes automatically. Alternatively, click Apply.
  • Produce a report.
  • Check scores of clustering results for various k.

Contoh

Kita akan mengexplorasi Widget k-Means melalui schema / workflow berikut.

K-MeansClustering-Schema.png.jpeg

Pertama, kita memuat dataset Iris, membaginya menjadi tiga cluster dan menampilkannya di widget Data Table, di mana kita bisa mengamati instance mana yang masuk ke cluster mana. Bagian yang menarik adalah Widget Scatter Plot dan Widget Select Rows.

Karena Widget k-Means menambahkan cluster index sebagai atribut class, Widget Scatter Plot akan mewarnai titik-titik sesuai dengan kelompoknya.

KMeans-Scatterplot.png

Yang kita akan benar-benar tertarik adalah seberapa baik cluster yang dilakukan oleh (unsupervised) clustering algorithm cocok dengan class aktual dalam data. Oleh karena itu, kita menggunakan widget Select Rows, di mana kami dapat memilih class-class individual dan menandai titik-titik terkait di widget Scatter Plot. Kecocokan yang sempurna terjadi pada setosa, dan cukup bagus untuk dua class lainnya.

K-MeansClustering-Example.png

Kita mungkin telah memperhatikan bahwa kita membiarkan Remove unused values/attributes dan Remove unused classes di widget Select Rows tidak dicentang. Ini penting: jika widget k-Means memodifikasi atribut, widget k-Means menampilkan daftar instance yang dimodifikasi dan widget Scatter Plot jadi tidak dapat membandingkannya dengan data original.

Mungkin cara yang lebih sederhana untuk menguji kecocokan antara cluster dan class original adalah dengan menggunakan widget Distributions.

K-MeansClustering-Example2.png

Satu-satunya masalah (kecil) di sini adalah bahwa widget Scatter Plot hanya memvisualisasikan atribut normal (dan bukan meta). Kita menyelesaikan masalah ini dengan menggunakan widget Select Columns: kita mengembalikan class Iris original sebagai class dan menempatkan cluster index sebagai atribut.

Kecocokan yang sempurna untuk setosa: semua contoh setosa berada di cluster ketiga (biru). 48 versicolors berada di cluster kedua (merah), sedangkan dua di yang pertama. Untuk virginica, 36 berada di kelompok pertama dan 14 di kelompok kedua.

Referensi

Pranala Menarik