Orange: Impute

From OnnoWiki
Jump to navigation Jump to search

Sumber: https://docs.biolab.si//3/visual-programming/widgets/data/impute.html


Widget Impute mengganti nilai unknown (tidak di ketahui) di data.

Input

Data: input dataset
Learner: learning algorithm for imputation

Output

Data: dataset with imputed values

Beberapa algoritma dan visualisasi Orange tidak dapat menangani nilai yang tidak diketahui dalam data. Widget Impute melakukan apa yang oleh ahli statistik disebut imputasi: ia menggantikan nilai yang hilang dengan nilai yang dihitung dari data atau ditetapkan oleh pengguna. Imputasi default adalah (1-NN).

Impute-stamped.png


  • In the top-most box, Default method, the user can specify a general imputation technique for all attributes.
    • Don’t Impute does nothing with the missing values.
    • Average/Most-frequent uses the average value (for continuous attributes) or the most common value (for discrete attributes).
    • As a distinct value creates new values to substitute the missing ones.
    • Model-based imputer constructs a model for predicting the missing value, based on values of other attributes; a separate model is constructed for each attribute. The default model is 1-NN learner, which takes the value from the most similar example (this is sometimes referred to as hot deck imputation). This algorithm can be substituted by one that the user connects to the input signal Learner for Imputation. Note, however, that if there are discrete and continuous attributes in the data, the algorithm needs to be capable of handling them both; at the moment only 1-NN learner can do that. (In the future, when Orange has more regressors, the Impute widget may have separate input signals for discrete and continuous models.)
    • Random values computes the distributions of values for each attribute and then imputes by picking random values from them.
    • Remove examples with missing values removes the example containing missing values. This check also applies to the class attribute if Impute class values is checked.
  • It is possible to specify individual treatment for each attribute, which overrides the default treatment set. One can also specify a manually defined value used for imputation. In the screenshot, we decided not to impute the values of “normalized-losses” and “make”, the missing values of “aspiration” will be replaced by random values, while the missing values of “body-style” and “drive-wheels” are replaced by “hatchback” and “fwd”,respectively. If the values of “length”, “width” or “height” are missing, the example is discarded. Values of all other attributes use the default method set above (model-based imputer, in our case).
  • The imputation methods for individual attributes are the same as default methods.
  • Restore All to Default resets the individual attribute treatments to default.
  • Produce a report.
  • All changes are committed immediately if Apply automatically is checked. Otherwise, Apply needs to be ticked to apply any new settings.

Contoh

Untuk mendemonstrasikan cara kerja widget Impute, kita bermain-main dengan dataset Iris dan menghapus beberapa data. Kita menggunakan widget Impute dan memilih imputer berbasis Model untuk meng-impute nilai yang hilang. Di Data Table lain, kita melihat bagaimana tanda tanya berubah menjadi nilai yang berbeda ("Iris-setosa," Iris-versicolor ").

Impute-Example.png


Referensi

Pranala Menarik