Difference between revisions of "Orange: Impute"

From OnnoWiki
Jump to navigation Jump to search
(Created page with "Sumber: https://docs.biolab.si//3/visual-programming/widgets/data/impute.html Replaces unknown values in the data. Inputs Data: input dataset Learner: learning a...")
 
 
(5 intermediate revisions by the same user not shown)
Line 2: Line 2:
  
  
 +
Widget Impute mengganti nilai unknown (tidak di ketahui) di data.
  
Replaces unknown values in the data.
+
==Input==
  
Inputs
+
Data: input dataset
 +
Learner: learning algorithm for imputation
  
    Data: input dataset
+
==Output==
  
    Learner: learning algorithm for imputation
+
Data: dataset with imputed values
  
Outputs
+
Beberapa algoritma dan visualisasi Orange tidak dapat menangani nilai yang tidak diketahui dalam data. Widget Impute melakukan apa yang oleh ahli statistik disebut imputasi: ia menggantikan nilai yang hilang dengan nilai yang dihitung dari data atau ditetapkan oleh pengguna. Imputasi default adalah (1-NN).
  
    Data: dataset with imputed values
+
[[File:Impute-stamped.png|center|500px|thumb]]
  
Some Orange’s algorithms and visualizations cannot handle unknown values in the data. This widget does what statisticians call imputation: it substitutes missing values by values either computed from the data or set by the user. The default imputation is (1-NN).
 
  
../../_images/impute-stamped.png
+
* In the top-most box, Default method, the user can specify a general imputation technique for all attributes.
 +
** Don’t Impute does nothing with the missing values.
 +
** Average/Most-frequent uses the average value (for continuous attributes) or the most common value (for discrete attributes).
 +
** As a distinct value creates new values to substitute the missing ones.
 +
** Model-based imputer constructs a model for predicting the missing value, based on values of other attributes; a separate model is constructed for each attribute. The default model is 1-NN learner, which takes the value from the most similar example (this is sometimes referred to as hot deck imputation). This algorithm can be substituted by one that the user connects to the input signal Learner for Imputation. Note, however, that if there are discrete and continuous attributes in the data, the algorithm needs to be capable of handling them both; at the moment only 1-NN learner can do that. (In the future, when Orange has more regressors, the Impute widget may have separate input signals for discrete and continuous models.)
 +
** Random values computes the distributions of values for each attribute and then imputes by picking random values from them.
 +
** Remove examples with missing values removes the example containing missing values. This check also applies to the class attribute if Impute class values is checked.
  
    In the top-most box, Default method, the user can specify a general imputation technique for all attributes.
+
* It is possible to specify individual treatment for each attribute, which overrides the default treatment set. One can also specify a manually defined value used for imputation. In the screenshot, we decided not to impute the values of “normalized-losses” and “make”, the missing values of “aspiration” will be replaced by random values, while the missing values of “body-style” and “drive-wheels” are replaced by “hatchback” and “fwd”,respectively. If the values of “length”, “width” or “height” are missing, the example is discarded. Values of all other attributes use the default method set above (model-based imputer, in our case).
 
+
* The imputation methods for individual attributes are the same as default methods.
        Don’t Impute does nothing with the missing values.
+
* Restore All to Default resets the individual attribute treatments to default.
 
+
* Produce a report.
        Average/Most-frequent uses the average value (for continuous attributes) or the most common value (for discrete attributes).
+
* All changes are committed immediately if Apply automatically is checked. Otherwise, Apply needs to be ticked to apply any new settings.
 
 
        As a distinct value creates new values to substitute the missing ones.
 
 
 
        Model-based imputer constructs a model for predicting the missing value, based on values of other attributes; a separate model is constructed for each attribute. The default model is 1-NN learner, which takes the value from the most similar example (this is sometimes referred to as hot deck imputation). This algorithm can be substituted by one that the user connects to the input signal Learner for Imputation. Note, however, that if there are discrete and continuous attributes in the data, the algorithm needs to be capable of handling them both; at the moment only 1-NN learner can do that. (In the future, when Orange has more regressors, the Impute widget may have separate input signals for discrete and continuous models.)
 
 
 
        Random values computes the distributions of values for each attribute and then imputes by picking random values from them.
 
 
 
        Remove examples with missing values removes the example containing missing values. This check also applies to the class attribute if Impute class values is checked.
 
 
 
    It is possible to specify individual treatment for each attribute, which overrides the default treatment set. One can also specify a manually defined value used for imputation. In the screenshot, we decided not to impute the values of “normalized-losses” and “make”, the missing values of “aspiration” will be replaced by random values, while the missing values of “body-style” and “drive-wheels” are replaced by “hatchback” and “fwd”,respectively. If the values of “length”, “width” or “height” are missing, the example is discarded. Values of all other attributes use the default method set above (model-based imputer, in our case).
 
 
 
    The imputation methods for individual attributes are the same as default methods.
 
 
 
    Restore All to Default resets the individual attribute treatments to default.
 
 
 
    Produce a report.
 
 
 
    All changes are committed immediately if Apply automatically is checked. Otherwise, Apply needs to be ticked to apply any new settings.
 
  
 
==Contoh==
 
==Contoh==
  
To demonstrate how the Impute widget works, we played around with the Iris dataset and deleted some of the data. We used the Impute widget and selected the Model-based imputer to impute the missing values. In another Data Table, we see how the question marks turned into distinct values (“Iris-setosa, “Iris-versicolor”).
+
Untuk mendemonstrasikan cara kerja widget Impute, kita bermain-main dengan dataset Iris dan menghapus beberapa data. Kita menggunakan widget Impute dan memilih imputer berbasis Model untuk meng-impute nilai yang hilang. Di Data Table lain, kita melihat bagaimana tanda tanya berubah menjadi nilai yang berbeda ("Iris-setosa," Iris-versicolor ").
  
../../_images/Impute-Example.png
+
[[File:Impute-Example.png|center|600px|thumb]]
  
  

Latest revision as of 09:00, 18 April 2020

Sumber: https://docs.biolab.si//3/visual-programming/widgets/data/impute.html


Widget Impute mengganti nilai unknown (tidak di ketahui) di data.

Input

Data: input dataset
Learner: learning algorithm for imputation

Output

Data: dataset with imputed values

Beberapa algoritma dan visualisasi Orange tidak dapat menangani nilai yang tidak diketahui dalam data. Widget Impute melakukan apa yang oleh ahli statistik disebut imputasi: ia menggantikan nilai yang hilang dengan nilai yang dihitung dari data atau ditetapkan oleh pengguna. Imputasi default adalah (1-NN).

Impute-stamped.png


  • In the top-most box, Default method, the user can specify a general imputation technique for all attributes.
    • Don’t Impute does nothing with the missing values.
    • Average/Most-frequent uses the average value (for continuous attributes) or the most common value (for discrete attributes).
    • As a distinct value creates new values to substitute the missing ones.
    • Model-based imputer constructs a model for predicting the missing value, based on values of other attributes; a separate model is constructed for each attribute. The default model is 1-NN learner, which takes the value from the most similar example (this is sometimes referred to as hot deck imputation). This algorithm can be substituted by one that the user connects to the input signal Learner for Imputation. Note, however, that if there are discrete and continuous attributes in the data, the algorithm needs to be capable of handling them both; at the moment only 1-NN learner can do that. (In the future, when Orange has more regressors, the Impute widget may have separate input signals for discrete and continuous models.)
    • Random values computes the distributions of values for each attribute and then imputes by picking random values from them.
    • Remove examples with missing values removes the example containing missing values. This check also applies to the class attribute if Impute class values is checked.
  • It is possible to specify individual treatment for each attribute, which overrides the default treatment set. One can also specify a manually defined value used for imputation. In the screenshot, we decided not to impute the values of “normalized-losses” and “make”, the missing values of “aspiration” will be replaced by random values, while the missing values of “body-style” and “drive-wheels” are replaced by “hatchback” and “fwd”,respectively. If the values of “length”, “width” or “height” are missing, the example is discarded. Values of all other attributes use the default method set above (model-based imputer, in our case).
  • The imputation methods for individual attributes are the same as default methods.
  • Restore All to Default resets the individual attribute treatments to default.
  • Produce a report.
  • All changes are committed immediately if Apply automatically is checked. Otherwise, Apply needs to be ticked to apply any new settings.

Contoh

Untuk mendemonstrasikan cara kerja widget Impute, kita bermain-main dengan dataset Iris dan menghapus beberapa data. Kita menggunakan widget Impute dan memilih imputer berbasis Model untuk meng-impute nilai yang hilang. Di Data Table lain, kita melihat bagaimana tanda tanya berubah menjadi nilai yang berbeda ("Iris-setosa," Iris-versicolor ").

Impute-Example.png


Referensi

Pranala Menarik