Orange: Discretize
Revision as of 08:16, 27 January 2020 by Onnowpurbo (talk | contribs)
Sumber: https://docs.biolab.si//3/visual-programming/widgets/data/discretize.html
Men-diskritisasi atribut continuous dari input dataset.
Input
Data: input dataset
Output
Data: dataset with discretized values
The Discretize widget discretizes continuous attributes with a selected method.
- Versi dasar dari widget relatif sederhana. Dia mengijinkan untuk memilih antara tiga metoda diskritisasi.
- Entropy-MDL, di temukan oleh Fayyad dan Irani adalah top-down discretization, yang secara recursive membagi attribute pada cut yang memaksimalkan informasi gain, sampai gain lebih rendah dari panjang deskripsi minimal cut. Diskritisasi ini dapat menghasilkan jumlah interval yang berubah-ubah, termasuk interval tunggal, dalam hal ini atribut dibuang sebagai tidak berguna (dihapus).
- Equal-frequency membagi atribut menjadi sejumlah interval tertentu, sehingga masing-masing berisi kira-kira jumlah instance yang sama.
- Equal-width evenly memisahkan kisaran antara nilai terkecil dan terbesar yang diamati. Jumlah interval dapat diatur secara manual.
- The widget can also be set to leave the attributes continuous or to remove them.
- To treat attributes individually, go to Individual Attribute Settings. They show a specific discretization of each attribute and allow changes. First, the top left list shows the cut-off points for each attribute. In the snapshot, we used the entropy-MDL discretization, which determines the optimal number of intervals automatically; we can see it discretized the age into seven intervals with cut-offs at 21.50, 23.50, 27.50, 35.50, 43.50, 54.50 and 61.50, respectively, while the capital-gain got split into many intervals with several cut-offs. The final weight (fnlwgt), for instance, was left with a single interval and thus removed. On the right, we can select a specific discretization method for each attribute. Attribute “fnlwgt” would be removed by the MDL-based discretization, so to prevent its removal, we select the attribute and choose, for instance, Equal-frequency discretization. We could also choose to leave the attribute continuous.
- Produce a report.
- Tick Apply automatically for the widget to automatically commit changes. Alternatively, press Apply.
Contoh
Dalam skema di bawah ini, kami menunjukkan dataset Iris dengan atribut kontinu (seperti dalam file data asli) dan dengan atribut diskritisasi.