Difference between revisions of "Orange: Stochastic Gradient Descent"
Onnowpurbo (talk | contribs) (Created page with "Sumber: https://docs.biolab.si//3/visual-programming/widgets/model/stochasticgradient.html Minimize an objective function using a stochastic approximation of gradient descen...") |
Onnowpurbo (talk | contribs) (→Contoh) |
||
(6 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
− | + | Widget Stochastic Gradient Descent meminimalkan objective function menggunakan stochastic approximation dari gradient descent. | |
− | + | ==Input== | |
− | + | Data: input dataset | |
+ | Preprocessor: preprocessing method(s) | ||
− | + | ==Output== | |
− | + | Learner: stochastic gradient descent learning algorithm | |
+ | Model: trained model | ||
− | + | Widget Stochastic Gradient Descent menggunakan stochastic gradient descent yang akan meminimalkan loss function terpilih dengan fungsi linier. Algoritma akan mengira-ngira gradient yang benar dengan mempertimbangkan satu sampel pada satu waktu, dan secara bersamaan memperbarui model berdasarkan pada gradien dari loss function. Untuk regression, | |
+ | widget Stochastic Gradient Descent akan memberikan predictor sebagai minimizers of the sum, yaitu, M-estimators, dan sangat bermanfaat untuk dataset large-scale dan sparse. | ||
− | + | [[File:StochasticGradientDescent-stamped.png|center|200px|thumb]] | |
− | + | * Specify the name of the model. The default name is “SGD”. | |
+ | * Algorithm parameters: | ||
− | + | ** Classification loss function: | |
+ | *** Hinge (linear SVM) | ||
+ | *** Logistic Regression (logistic regression SGD) | ||
+ | *** Modified Huber (smooth loss that brings tolerance to outliers as well as probability estimates) | ||
+ | *** Squared Hinge (quadratically penalized hinge) | ||
+ | *** Perceptron (linear loss used by the perceptron algorithm) | ||
+ | *** Squared Loss (fitted to ordinary least-squares) | ||
+ | *** Huber (switches to linear loss beyond ε) | ||
+ | *** Epsilon insensitive (ignores errors within ε, linear beyond it) | ||
+ | *** Squared epsilon insensitive (loss is squared beyond ε-region). | ||
− | + | ** Regression loss function: | |
+ | *** Squared Loss (fitted to ordinary least-squares) | ||
+ | *** Huber (switches to linear loss beyond ε) | ||
+ | *** Epsilon insensitive (ignores errors within ε, linear beyond it) | ||
+ | *** Squared epsilon insensitive (loss is squared beyond ε-region). | ||
− | + | * Regularization norms to prevent overfitting: | |
+ | ** None. | ||
+ | ** Lasso (L1) (L1 leading to sparse solutions) | ||
+ | ** Ridge (L2) (L2, standard regularizer) | ||
+ | ** Elastic net (mixing both penalty norms). | ||
+ | * Regularization strength defines how much regularization will be applied (the less we regularize, the more we allow the model to fit the data) and the mixing parameter what the ratio between L1 and L2 loss will be (if set to 0 then the loss is L2, if set to 1 then it is L1). | ||
+ | * Learning parameters. | ||
+ | ** Learning rate: | ||
+ | *** Constant: learning rate stays the same through all epochs (passes) | ||
+ | *** Optimal: a heuristic proposed by Leon Bottou | ||
+ | *** Inverse scaling: earning rate is inversely related to the number of iterations | ||
+ | ** Initial learning rate. | ||
+ | ** Inverse scaling exponent: learning rate decay. | ||
+ | ** Number of iterations: the number of passes through the training data. | ||
+ | ** If Shuffle data after each iteration is on, the order of data instances is mixed after each pass. | ||
+ | ** If Fixed seed for random shuffling is on, the algorithm will use a fixed random seed and enable replicating the results. | ||
− | + | * Produce a report. | |
+ | * Press Apply to commit changes. Alternatively, tick the box on the left side of the Apply button and changes will be communicated automatically. | ||
− | + | ==Contoh== | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | Dalam task classification, kita akan menggunakan dataset iris dan men-test dua model ke data tersebut. Kita akan connect widget Stochastic Gradient Descent dan widget Tree ke widget Test & Score. Kita akan juga connect widget File ke widget Test & Score dan mengamati performance model di widget. | ||
+ | [[File:StochasticGradientDescent-classification.png|center|600px|thumb]] | ||
+ | Dalam task regression, kita akan membandingkan tiga (3) model yang berbeda untuk melihat masing-masing akan menghasilkan prediksi seperti apa. Untuk kebutuhan ini, dataset housing akan digunakan. Kita connect widget File ke widget Stochastic Gradient Descent, widget Linear Regression dan widget kNN dan ke semuanya di connect ke widget Predictions. | ||
+ | [[File:StochasticGradientDescent-regression.png|center|600px|thumb]] | ||
==Referensi== | ==Referensi== |
Latest revision as of 18:22, 5 April 2020
Sumber: https://docs.biolab.si//3/visual-programming/widgets/model/stochasticgradient.html
Widget Stochastic Gradient Descent meminimalkan objective function menggunakan stochastic approximation dari gradient descent.
Input
Data: input dataset Preprocessor: preprocessing method(s)
Output
Learner: stochastic gradient descent learning algorithm Model: trained model
Widget Stochastic Gradient Descent menggunakan stochastic gradient descent yang akan meminimalkan loss function terpilih dengan fungsi linier. Algoritma akan mengira-ngira gradient yang benar dengan mempertimbangkan satu sampel pada satu waktu, dan secara bersamaan memperbarui model berdasarkan pada gradien dari loss function. Untuk regression, widget Stochastic Gradient Descent akan memberikan predictor sebagai minimizers of the sum, yaitu, M-estimators, dan sangat bermanfaat untuk dataset large-scale dan sparse.
- Specify the name of the model. The default name is “SGD”.
- Algorithm parameters:
- Classification loss function:
- Hinge (linear SVM)
- Logistic Regression (logistic regression SGD)
- Modified Huber (smooth loss that brings tolerance to outliers as well as probability estimates)
- Squared Hinge (quadratically penalized hinge)
- Perceptron (linear loss used by the perceptron algorithm)
- Squared Loss (fitted to ordinary least-squares)
- Huber (switches to linear loss beyond ε)
- Epsilon insensitive (ignores errors within ε, linear beyond it)
- Squared epsilon insensitive (loss is squared beyond ε-region).
- Classification loss function:
- Regression loss function:
- Squared Loss (fitted to ordinary least-squares)
- Huber (switches to linear loss beyond ε)
- Epsilon insensitive (ignores errors within ε, linear beyond it)
- Squared epsilon insensitive (loss is squared beyond ε-region).
- Regression loss function:
- Regularization norms to prevent overfitting:
- None.
- Lasso (L1) (L1 leading to sparse solutions)
- Ridge (L2) (L2, standard regularizer)
- Elastic net (mixing both penalty norms).
- Regularization strength defines how much regularization will be applied (the less we regularize, the more we allow the model to fit the data) and the mixing parameter what the ratio between L1 and L2 loss will be (if set to 0 then the loss is L2, if set to 1 then it is L1).
- Learning parameters.
- Learning rate:
- Constant: learning rate stays the same through all epochs (passes)
- Optimal: a heuristic proposed by Leon Bottou
- Inverse scaling: earning rate is inversely related to the number of iterations
- Initial learning rate.
- Inverse scaling exponent: learning rate decay.
- Number of iterations: the number of passes through the training data.
- If Shuffle data after each iteration is on, the order of data instances is mixed after each pass.
- If Fixed seed for random shuffling is on, the algorithm will use a fixed random seed and enable replicating the results.
- Learning rate:
- Produce a report.
- Press Apply to commit changes. Alternatively, tick the box on the left side of the Apply button and changes will be communicated automatically.
Contoh
Dalam task classification, kita akan menggunakan dataset iris dan men-test dua model ke data tersebut. Kita akan connect widget Stochastic Gradient Descent dan widget Tree ke widget Test & Score. Kita akan juga connect widget File ke widget Test & Score dan mengamati performance model di widget.
Dalam task regression, kita akan membandingkan tiga (3) model yang berbeda untuk melihat masing-masing akan menghasilkan prediksi seperti apa. Untuk kebutuhan ini, dataset housing akan digunakan. Kita connect widget File ke widget Stochastic Gradient Descent, widget Linear Regression dan widget kNN dan ke semuanya di connect ke widget Predictions.