Orange: Cross Validation
Sumber: https://orange.biolab.si/workflows/
Seberapa baik metode supervised data mining pada klasifikasi dataset kita? Berikut adalah workflow yang menilai berbagai teknik klasifikasi pada dataset (dalam contoh adalah iris). Widget utama yang digunakan di sini adalah untuk Widget Test & Score, yang diberikan data dan satu set learner, melakukan validasi silang dan skor akurasi prediksi, dan menghasilkan skor untuk pemeriksaan lebih lanjut.
The image represents an Orange Data Mining workflow designed for evaluating multiple classification models using cross-validation and analyzing misclassifications.
Workflow Breakdown:
1. File (Data Input)
- The dataset (e.g., "iris.tab") is loaded from documentation datasets.
- The note suggests that checking the dataset first is a good practice.
2. Data Table
- Displays the loaded dataset for initial inspection before applying machine learning models.
3. Learners (Classification Models)
- Three classification models are used:
- Logistic Regression
- Random Forest Classification
- Support Vector Machine (SVM)
- Multiple models can be used simultaneously for comparison.
4. Test & Score (Cross-Validation)
- Performs cross-validation to evaluate model performance.
- The note suggests double-clicking on this node to examine detailed performance scores.
5. Confusion Matrix
- Displays misclassifications for each model.
- The note suggests that selecting a cell in the confusion matrix provides additional data insights.
6. Data Table (1)
- Stores selected misclassification details for further analysis.
- Users can examine misclassified data points in a spreadsheet format.
Purpose of the Workflow:
- Loads and previews a labeled dataset.
- Trains and evaluates multiple classification models simultaneously.
- Uses cross-validation to compare model performance.
- Analyzes misclassification patterns using a confusion matrix.
- Allows further examination of misclassified instances via a separate data table.
This workflow is useful for model comparison, error analysis, and performance evaluation, helping users select the best model for a classification task.