Data Science: Membuat Model Machine Learning
Jump to navigation
Jump to search
The figure provides a visual overview of the machine learning model-building process, illustrating key steps from data preprocessing to model evaluation.
1. Initial Dataset and Preprocessing
- The process starts with an initial dataset containing input variables (X) and an output variable (Y).
- Exploratory Data Analysis (EDA) is performed using techniques like PCA (Principal Component Analysis) and SOM (Self-Organizing Map).
- Data cleaning, data curation, and removal of redundant features are carried out to prepare a pre-processed dataset.
2. Data Splitting
- The dataset is split into:
- 80% Training Set (used for training the model).
- 20% Test Set (used for evaluating the model).
3. Model Training and Optimization
- Different learning algorithms such as Support Vector Machine (SVM), Deep Learning (DL), Random Forest (RF), Decision Tree (DT), Gradient Boosting Machine (GBM), and K-Nearest Neighbors (KNN) are applied.
- Hyperparameter optimization is conducted to improve model performance.
- Feature selection helps in choosing the most important variables.
- Cross-validation ensures the reliability of the trained model.
4. Model Evaluation
- The trained model produces predicted Y values and is evaluated based on:
- Classification Metrics: Accuracy, Sensitivity, Specificity, and Matthew's Correlation Coefficient (MCC).
- Regression Metrics: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R² (coefficient of determination).
5. Final Evaluation
- The model’s performance is assessed, determining whether it meets the desired accuracy and reliability before deployment.
This visual guide simplifies the machine learning workflow, making it easier to understand the core steps involved in training and evaluating models effectively.