Difference between revisions of "Orange: Stochastic Gradient Descent"

From OnnoWiki
Jump to navigation Jump to search
Line 4: Line 4:
 
Minimize an objective function using a stochastic approximation of gradient descent.
 
Minimize an objective function using a stochastic approximation of gradient descent.
  
Inputs
+
==Input==
  
    Data: input dataset
+
Data: input dataset
    Preprocessor: preprocessing method(s)
+
Preprocessor: preprocessing method(s)
  
Outputs
+
==Output==
  
    Learner: stochastic gradient descent learning algorithm
+
Learner: stochastic gradient descent learning algorithm
    Model: trained model
+
Model: trained model
  
 
The Stochastic Gradient Descent widget uses stochastic gradient descent that minimizes a chosen loss function with a linear function. The algorithm approximates a true gradient by considering one sample at a time, and simultaneously updates the model based on the gradient of the loss function. For regression, it returns predictors as minimizers of the sum, i.e. M-estimators, and is especially useful for large-scale and sparse datasets.
 
The Stochastic Gradient Descent widget uses stochastic gradient descent that minimizes a chosen loss function with a linear function. The algorithm approximates a true gradient by considering one sample at a time, and simultaneously updates the model based on the gradient of the loss function. For regression, it returns predictors as minimizers of the sum, i.e. M-estimators, and is especially useful for large-scale and sparse datasets.
Line 18: Line 18:
 
[[File:StochasticGradientDescent-stamped.png|center|200px|thumb]]
 
[[File:StochasticGradientDescent-stamped.png|center|200px|thumb]]
  
    Specify the name of the model. The default name is “SGD”.
+
* Specify the name of the model. The default name is “SGD”.
    Algorithm parameters:
+
* Algorithm parameters:
        Classification loss function:
+
 
            Hinge (linear SVM)
+
** Classification loss function:
            Logistic Regression (logistic regression SGD)
+
*** Hinge (linear SVM)
            Modified Huber (smooth loss that brings tolerance to outliers as well as probability estimates)
+
*** Logistic Regression (logistic regression SGD)
            Squared Hinge (quadratically penalized hinge)
+
*** Modified Huber (smooth loss that brings tolerance to outliers as well as probability estimates)
            Perceptron (linear loss used by the perceptron algorithm)
+
*** Squared Hinge (quadratically penalized hinge)
            Squared Loss (fitted to ordinary least-squares)
+
*** Perceptron (linear loss used by the perceptron algorithm)
            Huber (switches to linear loss beyond ε)
+
*** Squared Loss (fitted to ordinary least-squares)
            Epsilon insensitive (ignores errors within ε, linear beyond it)
+
*** Huber (switches to linear loss beyond ε)
            Squared epsilon insensitive (loss is squared beyond ε-region).
+
*** Epsilon insensitive (ignores errors within ε, linear beyond it)
        Regression loss function:
+
*** Squared epsilon insensitive (loss is squared beyond ε-region).
            Squared Loss (fitted to ordinary least-squares)
+
 
            Huber (switches to linear loss beyond ε)
+
** Regression loss function:
            Epsilon insensitive (ignores errors within ε, linear beyond it)
+
*** Squared Loss (fitted to ordinary least-squares)
            Squared epsilon insensitive (loss is squared beyond ε-region).
+
*** Huber (switches to linear loss beyond ε)
    Regularization norms to prevent overfitting:
+
*** Epsilon insensitive (ignores errors within ε, linear beyond it)
        None.
+
*** Squared epsilon insensitive (loss is squared beyond ε-region).
        Lasso (L1) (L1 leading to sparse solutions)
+
 
        Ridge (L2) (L2, standard regularizer)
+
* Regularization norms to prevent overfitting:
        Elastic net (mixing both penalty norms).
+
** None.
    Regularization strength defines how much regularization will be applied (the less we regularize, the more we allow the model to fit the data) and the mixing parameter what the ratio between L1 and L2 loss will be (if set to 0 then the loss is L2, if set to 1 then it is L1).
+
** Lasso (L1) (L1 leading to sparse solutions)
    Learning parameters.
+
** Ridge (L2) (L2, standard regularizer)
        Learning rate:
+
** Elastic net (mixing both penalty norms).
            Constant: learning rate stays the same through all epochs (passes)
+
* Regularization strength defines how much regularization will be applied (the less we regularize, the more we allow the model to fit the data) and the mixing parameter what the ratio between L1 and L2 loss will be (if set to 0 then the loss is L2, if set to 1 then it is L1).
            Optimal: a heuristic proposed by Leon Bottou
+
* Learning parameters.
            Inverse scaling: earning rate is inversely related to the number of iterations
+
** Learning rate:
        Initial learning rate.
+
*** Constant: learning rate stays the same through all epochs (passes)
        Inverse scaling exponent: learning rate decay.
+
*** Optimal: a heuristic proposed by Leon Bottou
        Number of iterations: the number of passes through the training data.
+
*** Inverse scaling: earning rate is inversely related to the number of iterations
        If Shuffle data after each iteration is on, the order of data instances is mixed after each pass.
+
** Initial learning rate.
        If Fixed seed for random shuffling is on, the algorithm will use a fixed random seed and enable replicating the results.
+
** Inverse scaling exponent: learning rate decay.
    Produce a report.
+
** Number of iterations: the number of passes through the training data.
    Press Apply to commit changes. Alternatively, tick the box on the left side of the Apply button and changes will be communicated automatically.
+
** If Shuffle data after each iteration is on, the order of data instances is mixed after each pass.
 +
** If Fixed seed for random shuffling is on, the algorithm will use a fixed random seed and enable replicating the results.
 +
 
 +
* Produce a report.
 +
* Press Apply to commit changes. Alternatively, tick the box on the left side of the Apply button and changes will be communicated automatically.
  
 
==Contoh==
 
==Contoh==

Revision as of 11:21, 28 January 2020

Sumber: https://docs.biolab.si//3/visual-programming/widgets/model/stochasticgradient.html


Minimize an objective function using a stochastic approximation of gradient descent.

Input

Data: input dataset
Preprocessor: preprocessing method(s)

Output

Learner: stochastic gradient descent learning algorithm
Model: trained model

The Stochastic Gradient Descent widget uses stochastic gradient descent that minimizes a chosen loss function with a linear function. The algorithm approximates a true gradient by considering one sample at a time, and simultaneously updates the model based on the gradient of the loss function. For regression, it returns predictors as minimizers of the sum, i.e. M-estimators, and is especially useful for large-scale and sparse datasets.

StochasticGradientDescent-stamped.png
  • Specify the name of the model. The default name is “SGD”.
  • Algorithm parameters:
    • Classification loss function:
      • Hinge (linear SVM)
      • Logistic Regression (logistic regression SGD)
      • Modified Huber (smooth loss that brings tolerance to outliers as well as probability estimates)
      • Squared Hinge (quadratically penalized hinge)
      • Perceptron (linear loss used by the perceptron algorithm)
      • Squared Loss (fitted to ordinary least-squares)
      • Huber (switches to linear loss beyond ε)
      • Epsilon insensitive (ignores errors within ε, linear beyond it)
      • Squared epsilon insensitive (loss is squared beyond ε-region).
    • Regression loss function:
      • Squared Loss (fitted to ordinary least-squares)
      • Huber (switches to linear loss beyond ε)
      • Epsilon insensitive (ignores errors within ε, linear beyond it)
      • Squared epsilon insensitive (loss is squared beyond ε-region).
  • Regularization norms to prevent overfitting:
    • None.
    • Lasso (L1) (L1 leading to sparse solutions)
    • Ridge (L2) (L2, standard regularizer)
    • Elastic net (mixing both penalty norms).
  • Regularization strength defines how much regularization will be applied (the less we regularize, the more we allow the model to fit the data) and the mixing parameter what the ratio between L1 and L2 loss will be (if set to 0 then the loss is L2, if set to 1 then it is L1).
  • Learning parameters.
    • Learning rate:
      • Constant: learning rate stays the same through all epochs (passes)
      • Optimal: a heuristic proposed by Leon Bottou
      • Inverse scaling: earning rate is inversely related to the number of iterations
    • Initial learning rate.
    • Inverse scaling exponent: learning rate decay.
    • Number of iterations: the number of passes through the training data.
    • If Shuffle data after each iteration is on, the order of data instances is mixed after each pass.
    • If Fixed seed for random shuffling is on, the algorithm will use a fixed random seed and enable replicating the results.
  • Produce a report.
  • Press Apply to commit changes. Alternatively, tick the box on the left side of the Apply button and changes will be communicated automatically.

Contoh

For the classification task, we will use iris dataset and test two models on it. We connected Stochastic Gradient Descent and Tree to Test & Score. We also connected File to Test & Score and observed model performance in the widget.

StochasticGradientDescent-classification.png

For the regression task, we will compare three different models to see which predict what kind of results. For the purpose of this example, the housing dataset is used. We connect the File widget to Stochastic Gradient Descent, Linear Regression and kNN widget and all four to the Predictions widget.

StochasticGradientDescent-regression.png



Referensi

Pranala Menarik