SGD and GD

Both algorithms are methods for finding a set of parameters that minimize a loss function by evaluating parameters against data and then making adjustments.

In standard gradient descent, you’ll evaluate all training samples for each set of parameters. This is akin to taking big, slow steps toward the solution.

In stochastic gradient descent, you’ll evaluate only 1 training sample for the set of parameters before updating them. This is akin to taking small, quick steps toward the solution.