Difference between revisions of "Orange: Distances"

From OnnoWiki
Jump to navigation Jump to search
(Created page with "Sumber: https://docs.biolab.si//3/visual-programming/widgets/unsupervised/distances.html Computes distances between rows/columns in a dataset. Inputs Data: input datas...")
 
 
(6 intermediate revisions by the same user not shown)
Line 2: Line 2:
  
  
Computes distances between rows/columns in a dataset.
+
Menghitung distance antara baris/kolom di dataset.
  
Inputs
+
==Input==
  
    Data: input dataset
+
Data: input dataset
  
Outputs
+
==Output==
  
    Distances: distance matrix
+
Distances: distance matrix
  
The Distances widget computes distances between rows or columns in a dataset. By default, the data will be normalized to ensure equal treatment of individual features. Normalization is always done column-wise.
+
Widget Distances menghitung jarak antara baris atau kolom dalam dataset. Secara default, data akan dinormalisasi untuk memastikan perlakuan yang sama atas individual feature. Normalisasi selalu dilakukan column-wise (menggunakan kolom sebagai referensi).
  
Sparse data can only be used with Euclidean, Manhattan and Cosine metric.
+
Sparse data hanya bisa digunakan dengan Euclidean, Manhattan dan Cosine metric.
  
The resulting distance matrix can be fed further to Hierarchical Clustering for uncovering groups in the data, to Distance Map or Distance Matrix for visualizing the distances (Distance Matrix can be quite slow for larger data sets), to MDS for mapping the data instances using the distance matrix and finally, saved with Save Distance Matrix. Distance file can be loaded with Distance File.
+
Matrix distance oleh Widget Distances yang dihasilkan dapat diumpankan lebih lanjut ke Widget Hierarchical Clustering untuk mengungkap group dalam data, ke Widget Distance Map atau Widget Distance Matrix untuk memvisualisasikan jarak (Widget Distance Matrix bisa sangat lambat untuk dataset yang besar), ke Widget MDS untuk memetakan contoh data menggunakan matrix distance dan akhirnya, disimpan dengan Widget Save Distance Matrix. Distance File dapat di load dengan Widget Distance File.  
  
Distances work well with Orange add-ons, too. The distance matrix can be fed to Network from Distances (Network add-on) to convert the matrix into a graph and to Duplicate Detection (Text add-on) to find duplicate documents in the corpus.
+
Widget Distances juga bekerja dengan baik dengan add-on Orange lainnya. Widget Distance Matrix dapat diumpankan ke Widget Network from Distances (Network add-on) untuk mengubah matrix menjadi graph dan ke Widget Duplicate Detection (Text add-on) untuk menemukan duplikasi dokumen dalam corpus.
  
../../_images/Distances-stamped.png
+
[[File:Distances-stamped.png|center|200px|thumb]]
  
    Choose whether to measure distances between rows or columns.
+
* Choose whether to measure distances between rows or columns.
 +
* Choose the Distance Metric:
 +
** Euclidean (“straight line”, distance between two points)
 +
** Manhattan (the sum of absolute differences for all attributes)
 +
** Cosine (the cosine of the angle between two vectors of an inner product space)
 +
** Jaccard (the size of the intersection divided by the size of the union of the sample sets)
 +
** Spearman(linear correlation between the rank of the values, remapped as a distance in a [0, 1] interval)
 +
** Spearman absolute(linear correlation between the rank of the absolute values, remapped as a distance in a [0, 1] interval)
 +
** Pearson (linear correlation between the values, remapped as a distance in a [0, 1] interval)
 +
** Pearson absolute (linear correlation between the absolute values, remapped as a distance in a [0, 1] interval)
 +
** Hamming (the number of features at which the corresponding values are different)
 +
** Bhattacharyya distance (Similarity between two probability distributions, not a real distance as it doesn’t obey triangle inequality.)
 +
* Normalize the features. Normalization is always done column-wise. Values are zero centered and scaled. In case of missing values, the widget automatically imputes the average value of the row or the column. The widget works for both numeric and categorical data. In case of categorical data, the distance is 0 if the two values are the same (‘green’ and ‘green’) and 1 if they are not (‘green’ and ‘blue’).
 +
* ick Apply Automatically to automatically commit changes to other widgets. Alternatively, press ‘Apply’.
  
    Choose the Distance Metric:
+
==Contoh==
  
        Euclidean (“straight line”, distance between two points)
+
Contoh pertama menunjukkan penggunaan widget Distances. Kita menggunakan data iris.tab dari widget File. Kita menghitung jarak antara instance data (baris) dan meneruskan hasilnya ke Hierarchical Clustering. Berikut adalah Workflow sederhana untuk menemukan grup / cluster dalam instance data.
  
        Manhattan (the sum of absolute differences for all attributes)
+
[[File:Distances-Example1-rows.png|center|200px|thumb]]
  
        Cosine (the cosine of the angle between two vectors of an inner product space)
+
Atau, kita dapat menghitung jarak antar kolom dan menemukan betapa miripnya feature-feature yang ada.
 
 
        Jaccard (the size of the intersection divided by the size of the union of the sample sets)
 
 
 
        Spearman(linear correlation between the rank of the values, remapped as a distance in a [0, 1] interval)
 
 
 
        Spearman absolute(linear correlation between the rank of the absolute values, remapped as a distance in a [0, 1] interval)
 
 
 
        Pearson (linear correlation between the values, remapped as a distance in a [0, 1] interval)
 
 
 
        Pearson absolute (linear correlation between the absolute values, remapped as a distance in a [0, 1] interval)
 
 
 
        Hamming (the number of features at which the corresponding values are different)
 
 
 
        Bhattacharyya distance (Similarity between two probability distributions, not a real distance as it doesn’t obey triangle inequality.)
 
 
 
    Normalize the features. Normalization is always done column-wise. Values are zero centered and scaled. In case of missing values, the widget automatically imputes the average value of the row or the column. The widget works for both numeric and categorical data. In case of categorical data, the distance is 0 if the two values are the same (‘green’ and ‘green’) and 1 if they are not (‘green’ and ‘blue’).
 
 
 
    Tick Apply Automatically to automatically commit changes to other widgets. Alternatively, press ‘Apply’.
 
 
 
Examples
 
 
 
The first example shows a typical use of the Distances widget. We are using the iris.tab data from the File widget. We compute distances between data instances (rows) and pass the result to the Hierarchical Clustering. This is a simple workflow to find groups of data instances.
 
 
 
../../_images/Distances-Example1-rows.png
 
 
 
Alternatively, we can compute distance between columns and find how similar our features are.
 
 
 
../../_images/Distances-Example1-columns.png
 
 
 
The second example shows how to visualize the resulting distance matrix. A nice way to observe data similarity is in a Distance Map or in MDS.
 
 
 
../../_images/Distances-Example2.png
 
  
 +
[[File:Distances-Example1-columns.png|center|200px|thumb]]
  
 +
Contoh kedua menunjukkan bagaimana memvisualisasikan matrix distance yang dihasilkan. Cara yang bagus untuk mengamati kesamaan data adalah dalam Distance Map atau dalam MDS.
  
 +
[[File:Distances-Example2.png|center|200px|thumb]]
  
 
==Referensi==
 
==Referensi==

Latest revision as of 10:29, 6 March 2020

Sumber: https://docs.biolab.si//3/visual-programming/widgets/unsupervised/distances.html


Menghitung distance antara baris/kolom di dataset.

Input

Data: input dataset

Output

Distances: distance matrix

Widget Distances menghitung jarak antara baris atau kolom dalam dataset. Secara default, data akan dinormalisasi untuk memastikan perlakuan yang sama atas individual feature. Normalisasi selalu dilakukan column-wise (menggunakan kolom sebagai referensi).

Sparse data hanya bisa digunakan dengan Euclidean, Manhattan dan Cosine metric.

Matrix distance oleh Widget Distances yang dihasilkan dapat diumpankan lebih lanjut ke Widget Hierarchical Clustering untuk mengungkap group dalam data, ke Widget Distance Map atau Widget Distance Matrix untuk memvisualisasikan jarak (Widget Distance Matrix bisa sangat lambat untuk dataset yang besar), ke Widget MDS untuk memetakan contoh data menggunakan matrix distance dan akhirnya, disimpan dengan Widget Save Distance Matrix. Distance File dapat di load dengan Widget Distance File.

Widget Distances juga bekerja dengan baik dengan add-on Orange lainnya. Widget Distance Matrix dapat diumpankan ke Widget Network from Distances (Network add-on) untuk mengubah matrix menjadi graph dan ke Widget Duplicate Detection (Text add-on) untuk menemukan duplikasi dokumen dalam corpus.

Distances-stamped.png
  • Choose whether to measure distances between rows or columns.
  • Choose the Distance Metric:
    • Euclidean (“straight line”, distance between two points)
    • Manhattan (the sum of absolute differences for all attributes)
    • Cosine (the cosine of the angle between two vectors of an inner product space)
    • Jaccard (the size of the intersection divided by the size of the union of the sample sets)
    • Spearman(linear correlation between the rank of the values, remapped as a distance in a [0, 1] interval)
    • Spearman absolute(linear correlation between the rank of the absolute values, remapped as a distance in a [0, 1] interval)
    • Pearson (linear correlation between the values, remapped as a distance in a [0, 1] interval)
    • Pearson absolute (linear correlation between the absolute values, remapped as a distance in a [0, 1] interval)
    • Hamming (the number of features at which the corresponding values are different)
    • Bhattacharyya distance (Similarity between two probability distributions, not a real distance as it doesn’t obey triangle inequality.)
  • Normalize the features. Normalization is always done column-wise. Values are zero centered and scaled. In case of missing values, the widget automatically imputes the average value of the row or the column. The widget works for both numeric and categorical data. In case of categorical data, the distance is 0 if the two values are the same (‘green’ and ‘green’) and 1 if they are not (‘green’ and ‘blue’).
  • ick Apply Automatically to automatically commit changes to other widgets. Alternatively, press ‘Apply’.

Contoh

Contoh pertama menunjukkan penggunaan widget Distances. Kita menggunakan data iris.tab dari widget File. Kita menghitung jarak antara instance data (baris) dan meneruskan hasilnya ke Hierarchical Clustering. Berikut adalah Workflow sederhana untuk menemukan grup / cluster dalam instance data.

Distances-Example1-rows.png

Atau, kita dapat menghitung jarak antar kolom dan menemukan betapa miripnya feature-feature yang ada.

Distances-Example1-columns.png

Contoh kedua menunjukkan bagaimana memvisualisasikan matrix distance yang dihasilkan. Cara yang bagus untuk mengamati kesamaan data adalah dalam Distance Map atau dalam MDS.

Distances-Example2.png

Referensi

Pranala Menarik