Orange: Discretize
Revision as of 07:59, 27 January 2020 by Onnowpurbo (talk | contribs)
Sumber: https://docs.biolab.si//3/visual-programming/widgets/data/discretize.html
Men-diskritisasi atribut continuous dari input dataset.
Input
Data: input dataset
Output
Data: dataset with discretized values
The Discretize widget discretizes continuous attributes with a selected method.
- Versi dasar dari widget relatif sederhana. Dia mengijinkan untuk memilih antara tiga metoda diskritisasi.
- Entropy-MDL, di temukan oleh Fayyad dan Irani adalah top-down discretization, yang secara recursive membagi attribute at a cut maximizing information gain, until the gain is lower than the minimal description length of the cut. This discretization can result in an arbitrary number of intervals, including a single interval, in which case the attribute is discarded as useless (removed).
- Equal-frequency splits the attribute into a given number of intervals, so that they each contain approximately the same number of instances.
- Equal-width evenly splits the range between the smallest and the largest observed value. The Number of intervals can be set manually.
- The widget can also be set to leave the attributes continuous or to remove them.
- To treat attributes individually, go to Individual Attribute Settings. They show a specific discretization of each attribute and allow changes. First, the top left list shows the cut-off points for each attribute. In the snapshot, we used the entropy-MDL discretization, which determines the optimal number of intervals automatically; we can see it discretized the age into seven intervals with cut-offs at 21.50, 23.50, 27.50, 35.50, 43.50, 54.50 and 61.50, respectively, while the capital-gain got split into many intervals with several cut-offs. The final weight (fnlwgt), for instance, was left with a single interval and thus removed. On the right, we can select a specific discretization method for each attribute. Attribute “fnlwgt” would be removed by the MDL-based discretization, so to prevent its removal, we select the attribute and choose, for instance, Equal-frequency discretization. We could also choose to leave the attribute continuous.
- Produce a report.
- Tick Apply automatically for the widget to automatically commit changes. Alternatively, press Apply.
Contoh
Dalam skema di bawah ini, kami menunjukkan dataset Iris dengan atribut kontinu (seperti dalam file data asli) dan dengan atribut diskritisasi.