This is more efficient compared to stochastic gradient descent.
The generalization by finding the flat minima.
Mini-batches allows help to approximate the gradient of the entire training set which helps us to avoid local minima.
This is more efficient compared to stochastic gradient descent.
The generalization by finding the flat minima.
Mini-batches allows help to approximate the gradient of the entire training set which helps us to avoid local minima.