Stochastic Gradient Descent is an optimization algorithm that can be used to train neural network models.
The Stochastic Gradient Descent algorithm requires gradients to be calculated for each variable in the model so that new values for the variables can be calculated.
Back-propagation is an automatic differentiation algorithm that can be used to calculate the gradients for the parameters in neural networks.
Together, the back-propagation algorithm and Stochastic Gradient Descent algorithm can be used to train a neural network. We might call this “Stochastic Gradient Descent with Back-propagation.”
- Stochastic Gradient Descent With Back-propagation: A more complete description of the general algorithm used to train a neural network, referencing the optimization algorithm and gradient calculation algorithm.
It is common for practitioners to say they train their model using back-propagation. Technically, this is incorrect. Even as a short-hand, this would be incorrect. Back-propagation is not an optimization algorithm and cannot be used to train a model.
The term back-propagation is often misunderstood as meaning the whole learning algorithm for multi-layer neural networks. Actually, back-propagation refers only to the method for computing the gradient, while another algorithm, such as stochastic gradient descent, is used to perform learning using this gradient.
— Page 204, Deep Learning, 2016.
It would be fair to say that a neural network is trained or learns using Stochastic Gradient Descent as a shorthand, as it is assumed that the back-propagation algorithm is used to calculate gradients as part of the optimization procedure.
That being said, a different algorithm can be used to optimize the parameter of a neural network, such as a genetic algorithm that does not require gradients. If the Stochastic Gradient Descent optimization algorithm is used, a different algorithm can be used to calculate the gradients for the loss function with respect to the model parameters, such as alternate algorithms that implement the chain rule.
Nevertheless, the “Stochastic Gradient Descent with Back-propagation” combination is widely used because it is the most efficient and effective general approach sofar developed for fitting neural network models.