Orange: Outliers

From OnnoWiki
Revision as of 07:40, 11 January 2020 by Onnowpurbo (talk | contribs) (Created page with "Sumber: https://docs.biolab.si//3/visual-programming/widgets/data/outliers.html Simple outlier detection by comparing distances between instances. Inputs Data: input d...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Sumber: https://docs.biolab.si//3/visual-programming/widgets/data/outliers.html


Simple outlier detection by comparing distances between instances.

Inputs

   Data: input dataset
   Distances: distance matrix

Outputs

   Outliers: instances scored as outliers
   Inliers: instances not scored as outliers

The Outliers widget applies one of the two methods for outlier detection. Both methods apply classification to the dataset, one with SVM (multiple kernels) and the other with elliptical envelope. One-class SVM with non-linear kernels (RBF) performs well with non-Gaussian distributions, while Covariance estimator works only for data with Gaussian distribution.

../../_images/Outliers-stamped.png

   Information on the input data, number of inliers and outliers based on the selected model.
   Select the Outlier detection method:
       One class SVM with non-linear kernel (RBF): classifies data as similar or different from the core class:
           Nu is a parameter for the upper bound on the fraction of training errors and a lower bound of the fraction of support vectors
           Kernel coefficient is a gamma parameter, which specifies how much influence a single data instance has
       Covariance estimator: fits ellipsis to central points with Mahalanobis distance metric
           Contamination is the proportion of outliers in the dataset
           Support fraction specifies the proportion of points included in the estimate
   Produce a report.
   Click Detect outliers to output the data.

Example

Below, is a simple example of how to use this widget. We used the Iris dataset to detect the outliers. We chose the one class SVM with non-linear kernel (RBF) method, with Nu set at 20% (less training errors, more support vectors). Then we observed the outliers in the Data Table widget, while we sent the inliers to the Scatter Plot.

../../_images/Outliers-Example.png



Referensi

Pranala Menarik