Difference between revisions of "Orange: Feature Statistics"

From OnnoWiki
Jump to navigation Jump to search
(Created page with "Sumber: https://docs.biolab.si//3/visual-programming/widgets/data/featurestatistics.html Show basic statistics for data features. Inputs Data: input data Outputs...")
 
 
(10 intermediate revisions by the same user not shown)
Line 2: Line 2:
  
  
Show basic statistics for data features.
+
Widget Feature Statistics menampilkan statistik dasar dari data feature.
  
Inputs
+
==Input==
  
    Data: input data
+
Data: input data
  
Outputs
+
==Output==
  
    Reduced data: table containing only selected features
+
Reduced data: table containing only selected features
 +
Statistics: table containing statistics of the selected features
  
    Statistics: table containing statistics of the selected features
+
Widget Feature Statistics menyediakan cara cepat untuk memeriksa dan menemukan feature menarik dalam kumpulan data yang diberikan. Widget Feature Statistics pada dataset heart-disease data set. Feature exerc ind ang diubah secara manual menjadi variabel meta untuk tujuan ilustrasi.
  
The Feature Statistics widget provides a quick way to inspect and find interesting features in a given data set.
+
[[File:Feature statistics-stamped.png|center|400px|thumb]]
  
../../_images/feature_statistics-stamped.png
 
  
The Feature Statistics widget on the heart-disease data set. The feature exerc ind ang was manually changed to a meta variable for illustration purposes.
 
  
    Info on the current data set size and number and types of features
 
  
    The histograms on the right can be colored by any feature. If the selected feature is categorical, a discrete color palette is used (as shown in the example). If the selected feature is numerical, a continuous color palette is used. The table on the right contains statistics about each feature in the data set. The features can be sorted by each statistic, which we now describe.
+
* Info on the current data set size and number and types of features
 +
* The histograms on the right can be colored by any feature. If the selected feature is categorical, a discrete color palette is used (as shown in the example). If the selected feature is numerical, a continuous color palette is used. The table on the right contains statistics about each feature in the data set. The features can be sorted by each statistic, which we now describe.
 +
* The feature type - can be one of categorical, numeric, time and string.
 +
* The name of the feature.
 +
* A histogram of feature values. If the feature is numeric, we appropriately discretize the values into bins. If the feature is categorical, each value is assigned its own bar in the histogram.
 +
* The central tendency of the feature values. For categorical features, this is the mode. For numeric features, this is mean value.
 +
* The dispersion of the feature values. For categorical features, this is the entropy of the value distribution. For numeric features, this is the coefficient of variation.
 +
* The minimum value. This is computed for numerical and ordinal categorical features.
 +
* The maximum value. This is computed for numerical and ordinal categorical features.
 +
* The number of missing values in the data.
  
    The feature type - can be one of categorical, numeric, time and string.
+
Perhatikan juga bahwa beberapa baris berwarna berbeda. Baris putih menunjukkan feature reguler, baris abu-abu menunjukkan variabel class dan abu-abu yang lebih terang menunjukkan variabel meta.
  
    The name of the feature.
+
==Contoh==
  
    A histogram of feature values. If the feature is numeric, we appropriately discretize the values into bins. If the feature is categorical, each value is assigned its own bar in the histogram.
+
Widget Feature Statistics paling sering digunakan setelah widget File untuk memeriksa dan menemukan feature yang berpotensi menarik dalam data set yang diberikan. Dalam contoh berikut, kita menggunakan heart-disease data set.
  
    The central tendency of the feature values. For categorical features, this is the mode. For numeric features, this is mean value.
+
[[File:Feature statistics workflow.png|center|600px|thumb]]
  
    The dispersion of the feature values. For categorical features, this is the entropy of the value distribution. For numeric features, this is the coefficient of variation.
+
Setelah kita menemukan subset feature yang berpotensi menarik, atau kita telah menemukan feature yang ingin kita kecualikan, kita dapat dengan mudah memilih feature yang ingin kita pertahankan. Widget Feature Statistics mengeluarkan set data baru dengan hanya feature-feature ini.
 
 
    The minimum value. This is computed for numerical and ordinal categorical features.
 
 
 
    The maximum value. This is computed for numerical and ordinal categorical features.
 
 
 
    The number of missing values in the data.
 
 
 
Notice also that some rows are colored differently. White rows indicate regular features, gray rows indicate class variables and the lighter gray indicates meta variables.
 
Example
 
 
 
The Feature Statistics widget is most often used after the File widget to inspect and find potentially interesting features in the given data set. In the following examples, we use the heart-disease data set.
 
 
 
../../_images/feature_statistics_workflow.png
 
 
 
Once we have found a subset of potentially interesting features, or we have found features that we would like to exclude, we can simply select the features we want to keep. The widget outputs a new data set with only these features.
 
 
 
../../_images/feature_statistics_example1.png
 
 
 
Alternatively, if we want to store feature statistics, we can use the Statistics output and manipulate those values as needed. In this example, we simply select all the features and display the statistics in a table.
 
 
 
../../_images/feature_statistics_example2.png
 
  
 +
[[File:Feature statistics example1.png|center|600px|thumb]]
  
 +
Atau, jika kita ingin menyimpan statistik feature, kita dapat menggunakan output Statistik widget Feature Statistics dan memanipulasi nilai-nilai itu sesuai kebutuhan. Dalam contoh ini, kita cukup memilih semua fitur menggunakan
 +
Widget Feature Statistics dan menampilkan statistik dalam widget Data Table.
  
 +
[[File:Feature statistics example2.png|center|600px|thumb]]
  
 
==Referensi==
 
==Referensi==

Latest revision as of 11:02, 20 April 2020

Sumber: https://docs.biolab.si//3/visual-programming/widgets/data/featurestatistics.html


Widget Feature Statistics menampilkan statistik dasar dari data feature.

Input

Data: input data

Output

Reduced data: table containing only selected features
Statistics: table containing statistics of the selected features

Widget Feature Statistics menyediakan cara cepat untuk memeriksa dan menemukan feature menarik dalam kumpulan data yang diberikan. Widget Feature Statistics pada dataset heart-disease data set. Feature exerc ind ang diubah secara manual menjadi variabel meta untuk tujuan ilustrasi.

Feature statistics-stamped.png



  • Info on the current data set size and number and types of features
  • The histograms on the right can be colored by any feature. If the selected feature is categorical, a discrete color palette is used (as shown in the example). If the selected feature is numerical, a continuous color palette is used. The table on the right contains statistics about each feature in the data set. The features can be sorted by each statistic, which we now describe.
  • The feature type - can be one of categorical, numeric, time and string.
  • The name of the feature.
  • A histogram of feature values. If the feature is numeric, we appropriately discretize the values into bins. If the feature is categorical, each value is assigned its own bar in the histogram.
  • The central tendency of the feature values. For categorical features, this is the mode. For numeric features, this is mean value.
  • The dispersion of the feature values. For categorical features, this is the entropy of the value distribution. For numeric features, this is the coefficient of variation.
  • The minimum value. This is computed for numerical and ordinal categorical features.
  • The maximum value. This is computed for numerical and ordinal categorical features.
  • The number of missing values in the data.

Perhatikan juga bahwa beberapa baris berwarna berbeda. Baris putih menunjukkan feature reguler, baris abu-abu menunjukkan variabel class dan abu-abu yang lebih terang menunjukkan variabel meta.

Contoh

Widget Feature Statistics paling sering digunakan setelah widget File untuk memeriksa dan menemukan feature yang berpotensi menarik dalam data set yang diberikan. Dalam contoh berikut, kita menggunakan heart-disease data set.

Feature statistics workflow.png

Setelah kita menemukan subset feature yang berpotensi menarik, atau kita telah menemukan feature yang ingin kita kecualikan, kita dapat dengan mudah memilih feature yang ingin kita pertahankan. Widget Feature Statistics mengeluarkan set data baru dengan hanya feature-feature ini.

Feature statistics example1.png

Atau, jika kita ingin menyimpan statistik feature, kita dapat menggunakan output Statistik widget Feature Statistics dan memanipulasi nilai-nilai itu sesuai kebutuhan. Dalam contoh ini, kita cukup memilih semua fitur menggunakan Widget Feature Statistics dan menampilkan statistik dalam widget Data Table.

Feature statistics example2.png

Referensi

Pranala Menarik