Keras: 5 Step Life-Cycle for Long Short-Term Memory Model

Sumber: https://machinelearningmastery.com/5-step-life-cycle-long-short-term-memory-models-keras/

Deep learning neural network sangat mudah dibuat dan dievaluasi menggunakan Python dengan Keras, tetapi kita harus mengikuti model life-cycle yang ketat.

Dalam tulisan ini, kita akan menemukan step-by-step life-cycle untuk untuk membuat, melatih, dan mengevaluasi Long Short-Term Memory (LSTM) Recurrent Neural Network di Keras dan bagaimana membuat prediksi dengan model yang terlatih.

Setelah anda membaca tulisan ini, anda harusnya mengetahui tentang:

Cara mendefinisikan, meng-compile, fit, dan mengevaluasi LSTM dalam Keras.
Cara memilih standar default untuk regression and classification sequence prediction problem.
Cara merajut semuanya untuk mengembangkan dan menjalankan LSTM recurrent neural network di Keras.

Overview

Di bawah ini adalah overview dari 5 langkah dalam life-cycle model LSTM di Keras yang akan kita lihat.

Men-definisikan Network
Meng-compile Network
Fit Network
Meng-evaluasi Network
Membuat Prediksi

Environment

Untuk melakukan tutorial ini, kita membutuhkan

Python 2 atau 3
TensorFlow atau Theano
SciPy, scikit-learn, Pandas, NumPy, dan Matplotlib

Step 1. Men-Definisi-kan Network

Neural network didefinisikan dalam Keras sebagai urutan lapisan. Container untuk lapisan ini adalah class Sequential.

Langkah pertama adalah membuat instance dari class Sequential. Kemudian kita bisa membuat layer dan menambahkannya secara berurut dan saling terhubung. Lapisan berulang LSTM yang terdiri dari unit memori disebut LSTM(). Lapisan yang sepenuhnya terhubung yang sering mengikuti lapisan LSTM dan digunakan untuk menghasilkan prediksi disebut Dense().

Contoh, kita dapat melakukan ini dalam dua langkah:

model = Sequential()
model.add(LSTM(2))
model.add(Dense(1))

Atau kita juga bisa melakukan dalam satu langkah dengan membuat sebuah array dari layers dan memberikan ke constructor dari Squential layer.

layers = [LSTM(2), Dense(1)]
model = Sequential(layers)

Lapisan pertama dari network harus mendefinisikan jumlah input yang di harapkan. Input harus tiga-dimensi, terdiri dari sample, timestep, dan feature.

Sample. Ini adalah baris dari data anda.
Timestep. Ini adalah hasil pengukuran untuk feature, seperti lag variable.
Features. Ini adalah kolom dari data anda.

Asumsinya data anda di load sebagai array NumPy, kita dapat mengkonversikan 2D dataset ke 3D dataset menggunakan fungsi reshape() di NumPy. Jika anda ingin kolom untuk menjadi timestep sebagai satu feature, kita dapat menggunakan:

data = data.reshape((data.shape[0], data.shape[1], 1))

Jika anda ingin kolom dari 2D data untuk menjadi feature dengan one timestep, kita dapat menggunakan:

data = data.reshape((data.shape[0], 1, data.shape[1]))

Kita dapat men-spesifikasikan agurmen input shape yang mengharapkan tuple berisi sejumlah timesteps dan sejumlah feature. Contoh, jika kita hanya mempunyai dua (2) timestep dan sebuah feature untuk sebuah univariate time series dengan dua log obervation per row, maka spesifikasinya adalah sebagai berikut:

model = Sequential()
model.add(LSTM(5, input_shape=(2,1)))
model.add(Dense(1))

Lapisan LSTM dapat di tumpuk dengan menambahkan ke Sequential model. Yang penting, saat menumpuk lapisan LSTM, kita harus mengeluarkan sebuah urutan (sequence) bukan sebuah nilai tunggal untuk setiap input sehingga lapisan LSTM berikutnya dapat memiliki input 3D yang diperlukan. Kita bisa melakukan ini dengan mengatur argumen return_afterences ke True. Sebagai contoh:

model = Sequential()
model.add(LSTM(5, input_shape=(2,1), return_sequences=True))
model.add(LSTM(5))
model.add(Dense(1))

Bayangkan model Sequential sebagai saluran pipa dengan data yang anda masukkan di awal pada akhirnya keluar prediksi di sisi lain.

Ini adalah container yang bermanfaat di Keras karena kekhawatiran yang secara tradisional dikaitkan dengan lapisan juga dapat dibagi dan ditambahkan sebagai lapisan yang terpisah, dengan jelas menunjukkan peran mereka dalam transformasi data dari input ke prediksi.

Sebagai contoh, fungsi aktivasi yang mengubah sinyal yang dijumlahkan dari setiap neuron dalam suatu lapisan dapat diekstraksi dan ditambahkan ke Sequential sebagai objek seperti lapisan yang disebut Aktivasi.

model = Sequential()
model.add(LSTM(5, input_shape=(2,1)))
model.add(Dense(1))
model.add(Activation('sigmoid'))

Pilihan fungsi aktivasi paling penting untuk lapisan output karena akan menentukan format yang akan diambil prediksi.

Misalnya, di bawah ini adalah beberapa jenis masalah pemodelan prediktif yang umum dan struktur dan fungsi aktivasi standar yang dapat kita gunakan di lapisan output:

Regression: Fungsi aktivasi linier, atau 'linear', dan jumlah neuron yang cocok dengan jumlah output.
Binary Classification (2 class): Fungsi aktivasi logistik, atau 'sigmoid', dan satu neuron pada lapisan output.
Multiclass Classification (>2 class): Fungsi aktivasi softmax, atau 'softmax', dan satu neuron output per nilai kelas, dengan asumsi pola output one-hot encoded output.

Step 2. Compile Network

Compile adalah langkah efisiensi. Ini mengubah urutan lapisan sederhana yang kita definisikan menjadi serangkaian transformasi matrix yang sangat efisien dalam format yang dimaksudkan untuk dijalankan pada GPU atau CPU anda, tergantung pada bagaimana Keras dikonfigurasi.

Bayangkan meng-compile sebagai langkah komputasi awal untuk network anda. Itu selalu diperlukan setelah mendefinisikan model.

Compile membutuhkan sejumlah parameter yang harus ditentukan, khusus dirancang untuk melatih network kita. Secara khusus, algoritma optimasi digunakan untuk melatih network dan fungsi loss digunakan untuk mengevaluasi network yang diminimalkan oleh algoritma optimasi.

Sebagai contoh, di bawah ini adalah contoh compile model yang di-definisikan dan men-spefikasi-kan algoritma optimasi stochastic gradient descent (sgd) dan fungsi loss rata-rata kuadrat (mean_squared_error), untuk persoalan regresi.

model.compile(optimizer='sgd', loss='mean_squared_error')

Alternatif lain, optimizer dapat dibuat dan dikonfigurasi sebelum diberikan sebagai argumen untuk langkah kompilasi.

algorithm = SGD(lr=0.1, momentum=0.3)
model.compile(optimizer=algorithm, loss='mean_squared_error')

Jenis predictive modeling problem memberikan batasan pada jenis fungsi loss yang dapat digunakan.

Sebagai contoh, di bawah ini adalah beberapa fungsi loss standar untuk tipe model prediksi yang berbeda:

Regression: Mean Squared Error atau ‘mean_squared_error’.
Binary Classification (2 class): Logarithmic Loss, yang biasa di kenal sebagai cross entropy atau ‘binary_crossentropy‘.
Multiclass Classification (>2 class): Multiclass Logarithmic Loss atau ‘categorical_crossentropy‘.

Algoritma yang umum digunakan untuk optimisi adalah stochastic gradient descent, tapi Keras juga mendukung serangkaian algoritma pengoptimalan canggih lainnya yang berfungsi dengan baik dengan sedikit atau tanpa konfigurasi.

Mungkin algoritma optimasi yang paling umum digunakan karena kinerjanya yang umumnya lebih baik adalah:

Stochastic Gradient Descent, atau ‘sgd‘, ini membutuhkan tuning dari learning rate dan momentum.
ADAM, atau ‘adam‘, ini membutuhkan tuning dari learning rate.
RMSprop, atau ‘rmsprop‘, ini membutuhkan tuning dari learning rate.

Terakhir, kita juga bisa menentukan metrik yang akan dikumpulkan sambil menyesuaikan model kita di samping fungsi loss. Secara umum, metrik tambahan yang paling berguna untuk dikumpulkan adalah accuracy untuk masalah klasifikasi. Metrik yang akan dikumpulkan ditentukan oleh nama dalam array.

Contoh:

model.compile(optimizer='sgd', loss='mean_squared_error', metrics=['accuracy'])

Step 3. Fit Network

Once the network is compiled, it can be fit, which means adapt the weights on a training dataset.

Fitting the network requires the training data to be specified, both a matrix of input patterns, X, and an array of matching output patterns, y.

The network is trained using the backpropagation algorithm and optimized according to the optimization algorithm and loss function specified when compiling the model.

The backpropagation algorithm requires that the network be trained for a specified number of epochs or exposures to the training dataset.

Each epoch can be partitioned into groups of input-output pattern pairs called batches. This defines the number of patterns that the network is exposed to before the weights are updated within an epoch. It is also an efficiency optimization, ensuring that not too many input patterns are loaded into memory at a time.

A minimal example of fitting a network is as follows:

history = model.fit(X, y, batch_size=10, epochs=100)

Once fit, a history object is returned that provides a summary of the performance of the model during training. This includes both the loss and any additional metrics specified when compiling the model, recorded each epoch.

Training can take a long time, from seconds to hours to days depending on the size of the network and the size of the training data.

By default, a progress bar is displayed on the command line for each epoch. This may create too much noise for you, or may cause problems for your environment, such as if you are in an interactive notebook or IDE.

You can reduce the amount of information displayed to just the loss each epoch by setting the verbose argument to 2. You can turn off all output by setting verbose to 1. For example:

history = model.fit(X, y, batch_size=10, epochs=100, verbose=0)

Step 4. Evaluate Network

Once the network is trained, it can be evaluated.

The network can be evaluated on the training data, but this will not provide a useful indication of the performance of the network as a predictive model, as it has seen all of this data before.

We can evaluate the performance of the network on a separate dataset, unseen during testing. This will provide an estimate of the performance of the network at making predictions for unseen data in the future.

The model evaluates the loss across all of the test patterns, as well as any other metrics specified when the model was compiled, like classification accuracy. A list of evaluation metrics is returned.

For example, for a model compiled with the accuracy metric, we could evaluate it on a new dataset as follows:

loss, accuracy = model.evaluate(X, y)

As with fitting the network, verbose output is provided to give an idea of the progress of evaluating the model. We can turn this off by setting the verbose argument to 0.

loss, accuracy = model.evaluate(X, y, verbose=0)

Step 5. Make Predictions

Once we are satisfied with the performance of our fit model, we can use it to make predictions on new data.

This is as easy as calling the predict() function on the model with an array of new input patterns.

For example:

predictions = model.predict(X)

The predictions will be returned in the format provided by the output layer of the network.

In the case of a regression problem, these predictions may be in the format of the problem directly, provided by a linear activation function.

For a binary classification problem, the predictions may be an array of probabilities for the first class that can be converted to a 1 or 0 by rounding.

For a multiclass classification problem, the results may be in the form of an array of probabilities (assuming a one hot encoded output variable) that may need to be converted to a single class output prediction using the argmax() NumPy function.

Alternately, for classification problems, we can use the predict_classes() function that will automatically convert uncrisp predictions to crisp integer class values.

predictions = model.predict_classes(X)

As with fitting and evaluating the network, verbose output is provided to given an idea of the progress of the model making predictions. We can turn this off by setting the verbose argument to 0.

predictions = model.predict(X, verbose=0)

End-to-End Worked Example

Let’s tie all of this together with a small worked example.

This example will use a simple problem of learning a sequence of 10 numbers. We will show the network a number, such as 0.0 and expect it to predict 0.1. Then show it 0.1 and expect it to predict 0.2, and so on to 0.9.

   Define Network: We will construct an LSTM neural network with a 1 input timestep and 1 input feature in the visible layer, 10 memory units in the LSTM hidden layer, and 1 neuron in the fully connected output layer with a linear (default) activation function.
   Compile Network: We will use the efficient ADAM optimization algorithm with default configuration and the mean squared error loss function because it is a regression problem.
   Fit Network: We will fit the network for 1,000 epochs and use a batch size equal to the number of patterns in the training set. We will also turn off all verbose output.
   Evaluate Network. We will evaluate the network on the training dataset. Typically we would evaluate the model on a test or validation set.
   Make Predictions. We will make predictions for the training input data. Again, typically we would make predictions on data where we do not know the right answer.

The complete code listing is provided below.

# Example of LSTM to learn a sequence
from pandas import DataFrame
from pandas import concat
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
# create sequence
length = 10
sequence = [i/float(length) for i in range(length)]
print(sequence)
# create X/y pairs
df = DataFrame(sequence)
df = concat([df.shift(1), df], axis=1)
df.dropna(inplace=True)
# convert to LSTM friendly format
values = df.values
X, y = values[:, 0], values[:, 1]
X = X.reshape(len(X), 1, 1)
# 1. define network
model = Sequential()
model.add(LSTM(10, input_shape=(1,1)))
model.add(Dense(1))
# 2. compile network
model.compile(optimizer='adam', loss='mean_squared_error')
# 3. fit network
history = model.fit(X, y, epochs=1000, batch_size=len(X), verbose=0)
# 4. evaluate network
loss = model.evaluate(X, y, verbose=0)
print(loss)
# 5. make predictions
predictions = model.predict(X, verbose=0)
print(predictions[:, 0])

Running this example produces the following output, showing the raw input sequence of 10 numbers, the mean squared error loss of the network when making predictions for the entire sequence, and the predictions for each input pattern.

Outputs were spaced out for readability.

We can see the sequence is learned well, especially if we round predictions to the first decimal place.

[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]

4.54527471447e-05

[ 0.11612834 0.20493418 0.29793766 0.39445466 0.49376178 0.59512401
0.69782174 0.80117452 0.90455914]

Summary

In this post, you discovered the 5-step life-cycle of an LSTM recurrent neural network using the Keras library.

Specifically, you learned:

How to define, compile, fit, evaluate, and make predictions for an LSTM network in Keras.
How to select activation functions and output layer configurations for classification and regression problems.
How to develop and run your first LSTM model in Keras.

Referensi

https://machinelearningmastery.com/5-step-life-cycle-long-short-term-memory-models-keras/

Pranala Menarik

Keras
Python