Orange: Impute
Sumber: https://docs.biolab.si//3/visual-programming/widgets/data/impute.html
Ganti nilai unknown (tidak di ketahui) di data.
Input
Data: input dataset Learner: learning algorithm for imputation
Output
Data: dataset with imputed values
Beberapa algoritma dan visualisasi Orange tidak dapat menangani nilai yang tidak diketahui dalam data. Widget ini melakukan apa yang oleh ahli statistik disebut imputasi: ia menggantikan nilai yang hilang dengan nilai yang dihitung dari data atau ditetapkan oleh pengguna. Imputasi default adalah (1-NN).
- In the top-most box, Default method, the user can specify a general imputation technique for all attributes.
- Don’t Impute does nothing with the missing values.
- Average/Most-frequent uses the average value (for continuous attributes) or the most common value (for discrete attributes).
- As a distinct value creates new values to substitute the missing ones.
- Model-based imputer constructs a model for predicting the missing value, based on values of other attributes; a separate model is constructed for each attribute. The default model is 1-NN learner, which takes the value from the most similar example (this is sometimes referred to as hot deck imputation). This algorithm can be substituted by one that the user connects to the input signal Learner for Imputation. Note, however, that if there are discrete and continuous attributes in the data, the algorithm needs to be capable of handling them both; at the moment only 1-NN learner can do that. (In the future, when Orange has more regressors, the Impute widget may have separate input signals for discrete and continuous models.)
- Random values computes the distributions of values for each attribute and then imputes by picking random values from them.
- Remove examples with missing values removes the example containing missing values. This check also applies to the class attribute if Impute class values is checked.
- It is possible to specify individual treatment for each attribute, which overrides the default treatment set. One can also specify a manually defined value used for imputation. In the screenshot, we decided not to impute the values of “normalized-losses” and “make”, the missing values of “aspiration” will be replaced by random values, while the missing values of “body-style” and “drive-wheels” are replaced by “hatchback” and “fwd”,respectively. If the values of “length”, “width” or “height” are missing, the example is discarded. Values of all other attributes use the default method set above (model-based imputer, in our case).
- The imputation methods for individual attributes are the same as default methods.
- Restore All to Default resets the individual attribute treatments to default.
- Produce a report.
- All changes are committed immediately if Apply automatically is checked. Otherwise, Apply needs to be ticked to apply any new settings.
Contoh
Untuk mendemonstrasikan cara kerja Impute widget, kita bermain-main dengan dataset Iris dan menghapus beberapa data. Kita menggunakan Impute widget dan memilih imputer berbasis Model untuk meng-impute nilai yang hilang. Di Data Table lain, kita melihat bagaimana tanda tanya berubah menjadi nilai yang berbeda ("Iris-setosa," Iris-versicolor ").