Difference between revisions of "Orange: Discretize"

From OnnoWiki
Jump to navigation Jump to search
Line 16: Line 16:
 
[[File:Discretize-All-stamped.png|center|200px|thumb]]
 
[[File:Discretize-All-stamped.png|center|200px|thumb]]
  
* The basic version of the widget is rather simple. It allows choosing between three different discretizations.
+
* Versi dasar dari widget relatif sederhana. Dia mengijinkan untuk memilih antara tiga metoda diskritisasi.
** Entropy-MDL, invented by Fayyad and Irani is a top-down discretization, which recursively splits the attribute at a cut maximizing information gain, until the gain is lower than the minimal description length of the cut. This discretization can result in an arbitrary number of intervals, including a single interval, in which case the attribute is discarded as useless (removed).
+
** '''Entropy-MDL''', di temukan oleh Fayyad dan Irani adalah top-down discretization, yang secara recursive membagi attribute at a cut maximizing information gain, until the gain is lower than the minimal description length of the cut. This discretization can result in an arbitrary number of intervals, including a single interval, in which case the attribute is discarded as useless (removed).
** Equal-frequency splits the attribute into a given number of intervals, so that they each contain approximately the same number of instances.
+
** '''Equal-frequency''' splits the attribute into a given number of intervals, so that they each contain approximately the same number of instances.
** Equal-width evenly splits the range between the smallest and the largest observed value. The Number of intervals can be set manually.
+
** '''Equal-width evenly''' splits the range between the smallest and the largest observed value. The Number of intervals can be set manually.
 
** The widget can also be set to leave the attributes continuous or to remove them.
 
** The widget can also be set to leave the attributes continuous or to remove them.
 
* To treat attributes individually, go to Individual Attribute Settings. They show a specific discretization of each attribute and allow changes. First, the top left list shows the cut-off points for each attribute. In the snapshot, we used the entropy-MDL discretization, which determines the optimal number of intervals automatically; we can see it discretized the age into seven intervals with cut-offs at 21.50, 23.50, 27.50, 35.50, 43.50, 54.50 and 61.50, respectively, while the capital-gain got split into many intervals with several cut-offs. The final weight (fnlwgt), for instance, was left with a single interval and thus removed. On the right, we can select a specific discretization method for each attribute. Attribute “fnlwgt” would be removed by the MDL-based discretization, so to prevent its removal, we select the attribute and choose, for instance, Equal-frequency discretization. We could also choose to leave the attribute continuous.
 
* To treat attributes individually, go to Individual Attribute Settings. They show a specific discretization of each attribute and allow changes. First, the top left list shows the cut-off points for each attribute. In the snapshot, we used the entropy-MDL discretization, which determines the optimal number of intervals automatically; we can see it discretized the age into seven intervals with cut-offs at 21.50, 23.50, 27.50, 35.50, 43.50, 54.50 and 61.50, respectively, while the capital-gain got split into many intervals with several cut-offs. The final weight (fnlwgt), for instance, was left with a single interval and thus removed. On the right, we can select a specific discretization method for each attribute. Attribute “fnlwgt” would be removed by the MDL-based discretization, so to prevent its removal, we select the attribute and choose, for instance, Equal-frequency discretization. We could also choose to leave the attribute continuous.

Revision as of 07:59, 27 January 2020

Sumber: https://docs.biolab.si//3/visual-programming/widgets/data/discretize.html


Men-diskritisasi atribut continuous dari input dataset.

Input

Data: input dataset

Output

Data: dataset with discretized values

The Discretize widget discretizes continuous attributes with a selected method.

Discretize-All-stamped.png
  • Versi dasar dari widget relatif sederhana. Dia mengijinkan untuk memilih antara tiga metoda diskritisasi.
    • Entropy-MDL, di temukan oleh Fayyad dan Irani adalah top-down discretization, yang secara recursive membagi attribute at a cut maximizing information gain, until the gain is lower than the minimal description length of the cut. This discretization can result in an arbitrary number of intervals, including a single interval, in which case the attribute is discarded as useless (removed).
    • Equal-frequency splits the attribute into a given number of intervals, so that they each contain approximately the same number of instances.
    • Equal-width evenly splits the range between the smallest and the largest observed value. The Number of intervals can be set manually.
    • The widget can also be set to leave the attributes continuous or to remove them.
  • To treat attributes individually, go to Individual Attribute Settings. They show a specific discretization of each attribute and allow changes. First, the top left list shows the cut-off points for each attribute. In the snapshot, we used the entropy-MDL discretization, which determines the optimal number of intervals automatically; we can see it discretized the age into seven intervals with cut-offs at 21.50, 23.50, 27.50, 35.50, 43.50, 54.50 and 61.50, respectively, while the capital-gain got split into many intervals with several cut-offs. The final weight (fnlwgt), for instance, was left with a single interval and thus removed. On the right, we can select a specific discretization method for each attribute. Attribute “fnlwgt” would be removed by the MDL-based discretization, so to prevent its removal, we select the attribute and choose, for instance, Equal-frequency discretization. We could also choose to leave the attribute continuous.
  • Produce a report.
  • Tick Apply automatically for the widget to automatically commit changes. Alternatively, press Apply.

Contoh

Dalam skema di bawah ini, kami menunjukkan dataset Iris dengan atribut kontinu (seperti dalam file data asli) dan dengan atribut diskritisasi.

Discretize-Example.png


Referensi

Pranala Menarik