Keras: Cara Training Final Machine Learning Model
Sumber: https://machinelearningmastery.com/train-final-machine-learning-model/
Model machine learning yang kita gunakan untuk membuat prediksi pada data baru disebut final model. Mungkin ada kebingungan dalam machine learning yang diterapkan tentang cara melatih final model. Kebingungan ini terlihat pada pemula di bidang ini yang mengajukan pertanyaan seperti:
- Bagaimana cara kita mem-predict menggunakan cross validation?
- Model mana yang perlu kita pilih untuk cross-validation?
- Apakah kita menggunakan model sesudah menyiapkannya untuk training dataset?
Semoga tulisan ini bisa menghapus kebingungan. Dalam posting ini, kita akan menemukan cara menyelesaikan model machine learning kita untuk membuat prediksi pada data baru.
Apakah itu Final Model?
Model final machine learning adalah model yang kita gunakan untuk membuat prediksi pada data baru. Artinya, dengan diberikan contoh input data baru, kita ingin menggunakan model untuk memprediksi output yang diharapkan. Ini mungkin klasifikasi (menetapkan label) atau regresi (nilai riil).
Misalnya, apakah foto itu adalah gambar anjing atau kucing, atau perkiraan jumlah penjualan untuk besok.
Tujuan dari proyek machine learning anda adalah untuk memperoleh final model yang terbaik, di mana "terbaik" didefinisikan oleh:
- Data: historical data yang kita miliki.
- Time: lama waktu yang kita berikan untuk project ini.
- Procedure: langkah data preparation, algoritma, dan pilihan konfigurasi algoritma yang digunakan.
Dalam proyek anda, anda mengumpulkan data, menghabiskan waktu yang anda miliki, dan menemukan prosedur persiapan data, algoritma untuk digunakan, dan cara mengkonfigurasinya.
Final Model adalah puncak dari proses ini, akhirnya yang anda akan anda lakukan adalah untuk mulai benar-benar membuat prediksi.
Fungsi Train/Test Set
Mengapa kita menggunakan set training dan testing?
Men-split training dan testing dari dataset anda adalah salah satu metode untuk dengan cepat mengevaluasi kinerja suatu algoritma pada masalah anda.
Dataset training digunakan untuk menyiapkan model, untuk me-training nya.
Kami berpura-pura set data testing adalah data baru di mana nilai-nilai output ditahan dari algoritma. Kami mengumpulkan prediksi dari model yang terlatih pada input dari dataset uji dan membandingkannya dengan nilai output yang ditahan dari set testing.
Membandingkan prediksi dan keluaran yang ditahan pada dataset testing memungkinkan kita untuk menghitung ukuran kinerja untuk model pada dataset testing. Ini adalah perkiraan keterampilan algoritma yang di training tentang masalah saat membuat prediksi pada data yang tidak terlihat.
Lanjut......
Ketika kita mengevaluasi suatu algoritma, kita sebenarnya mengevaluasi semua langkah dalam prosedur, termasuk bagaimana data pelatihan disiapkan (misalnya scaling), pilihan algoritma (misalnya kNN), dan bagaimana algoritma yang dipilih dikonfigurasi (misalnya k=3).
Ukuran kinerja yang dihitung berdasarkan prediksi yang di peroleh dan ini merupakan perkiraan performance seluruh prosedur.
Kita menggeneralisasi ukuran kinerja dari:
"Keberhasilan prosedur pada set testing"
ke
"Keberhasilan prosedur pada data yang belum pernah dilihat sebelumnya".
Ini merupakan lompatan yang cukup jauh dan mengharuskan:
- Prosedur yang dibuat cukup baik sehingga bisa meng-estimasi mendekati apa yang sebenarnya kita harapkan pada data yang tidak terlihat.
- Pilihan ukuran kinerja secara akurat menangkap apa yang kita minati untuk mengukur dalam prediksi pada data yang tidak terlihat.
- The choice of data preparation is well understood and repeatable on new data, and reversible if predictions need to be returned to their original scale or related to the original input values.
- The choice of algorithm makes sense for its intended use and operational environment (e.g. complexity or chosen programming language).
A lot rides on the estimated skill of the whole procedure on the test set.
In fact, using the train/test method of estimating the skill of the procedure on unseen data often has a high variance (unless we have a heck of a lot of data to split). This means that when it is repeated, it gives different results, often very different results.
The outcome is that we may be quite uncertain about how well the procedure actually performs on unseen data and how one procedure compares to another.
Often, time permitting, we prefer to use k-fold cross-validation instead. The Purpose of k-fold Cross Validation
Why do we use k-fold cross validation?
Cross-validation is another method to estimate the skill of a method on unseen data. Like using a train-test split.
Cross-validation systematically creates and evaluates multiple models on multiple subsets of the dataset.
This, in turn, provides a population of performance measures.
We can calculate the mean of these measures to get an idea of how well the procedure performs on average. We can calculate the standard deviation of these measures to get an idea of how much the skill of the procedure is expected to vary in practice.
This is also helpful for providing a more nuanced comparison of one procedure to another when you are trying to choose which algorithm and data preparation procedures to use.
Also, this information is invaluable as you can use the mean and spread to give a confidence interval on the expected performance on a machine learning procedure in practice.
Both train-test splits and k-fold cross validation are examples of resampling methods. Why do we use Resampling Methods?
The problem with applied machine learning is that we are trying to model the unknown.
On a given predictive modeling problem, the ideal model is one that performs the best when making predictions on new data.
We don’t have new data, so we have to pretend with statistical tricks.
The train-test split and k-fold cross validation are called resampling methods. Resampling methods are statistical procedures for sampling a dataset and estimating an unknown quantity.
In the case of applied machine learning, we are interested in estimating the skill of a machine learning procedure on unseen data. More specifically, the skill of the predictions made by a machine learning procedure.
Once we have the estimated skill, we are finished with the resampling method.
If you are using a train-test split, that means you can discard the split datasets and the trained model. If you are using k-fold cross-validation, that means you can throw away all of the trained models.
They have served their purpose and are no longer needed.
You are now ready to finalize your model. How to Finalize a Model?
You finalize a model by applying the chosen machine learning procedure on all of your data.
That’s it.
With the finalized model, you can:
Save the model for later or operational use. Make predictions on new data.
What about the cross-validation models or the train-test datasets?
They’ve been discarded. They are no longer needed. They have served their purpose to help you choose a procedure to finalize. Common Questions
This section lists some common questions you might have. Why not keep the model trained on the training dataset?
and Why not keep the best model from the cross-validation?
You can if you like.
You may save time and effort by reusing one of the models trained during skill estimation.
This can be a big deal if it takes days, weeks, or months to train a model.
Your model will likely perform better when trained on all of the available data than just the subset used to estimate the performance of the model.
This is why we prefer to train the final model on all available data. Won’t the performance of the model trained on all of the data be different?
I think this question drives most of the misunderstanding around model finalization.
Put another way:
If you train a model on all of the available data, then how do you know how well the model will perform?
You have already answered this question using the resampling procedure.
If well designed, the performance measures you calculate using train-test or k-fold cross validation suitably describe how well the finalized model trained on all available historical data will perform in general.
If you used k-fold cross validation, you will have an estimate of how “wrong” (or conversely, how “right”) the model will be on average, and the expected spread of that wrongness or rightness.
This is why the careful design of your test harness is so absolutely critical in applied machine learning. A more robust test harness will allow you to lean on the estimated performance all the more. Each time I train the model, I get a different performance score; should I pick the model with the best score?
Machine learning algorithms are stochastic and this behavior of different performance on the same data is to be expected.
Resampling methods like repeated train/test or repeated k-fold cross-validation will help to get a handle on how much variance there is in the method.
If it is a real concern, you can create multiple final models and take the mean from an ensemble of predictions in order to reduce the variance.
I talk more about this in the post:
Embrace Randomness in Machine Learning
Summary
In this post, you discovered how to train a final machine learning model for operational use.
You have overcome obstacles to finalizing your model, such as:
Understanding the goal of resampling procedures such as train-test splits and k-fold cross validation. Model finalization as training a new model on all available data. Separating the concern of estimating performance from finalizing the model.
Do you have another question or concern about finalizing your model that I have not addressed? Ask in the comments and I will do my best to help
Referensi