Orange: Feature Ranking
Sumber:https://orange.biolab.si/workflows/
Untuk masalah yang supervised, di mana instance data dijelaskan dengan class label, kita ingin tahu feature mana yang paling informatif. Widget Rank menyediakan tabel fitur dan skor informativitasnya, dan mendukung pemilihan fitur manual. Dalam workflow, kita menggunakannya untuk menemukan dua fitur terbaik (dari 79 awal dari dataset yang dipilih) dan menampilkan di widget Scatter Plot.
The image represents an Orange Data Mining workflow designed for data preprocessing, feature selection, and visualization.
Workflow Breakdown:
1. File (Data Input)
- Loads a dataset that may contain missing values and multiple features.
2. Impute (Handling Missing Values)
- This step fills in missing values to ensure that all data points are available for further processing.
- The note mentions that imputation was necessary for proper visualization.
3. Rank (Feature Selection)
- This widget evaluates feature importance and ranks them based on their relevance.
- The two most informative features were selected for further analysis.
4. Scatter Plot (Visualization)
- The selected features are used to generate a scatter plot.
- The note suggests checking whether the most informative features provide a good class separation in the data.
Purpose of the Workflow:
- Handles missing values through imputation.
- Ranks features based on informativeness to select the best predictors.
- Visualizes the most relevant features using a scatter plot to assess data separability.
This workflow is useful for feature engineering and exploratory data analysis, helping in selecting the best attributes for machine learning models.
Source
Referensi
- YOUTUBE: https://www.youtube.com/watch?v=p5XLWmSUxTQ
- YOUTUBE: https://www.youtube.com/watch?v=Fw5SztV5p3E
- https://orange.biolab.si/workflows/