. What are the benefits of mini-batch gradient descent?

This is more efficient compared to stochastic gradient descent.
The generalization by finding the flat minima.
Mini-batches allows help to approximate the gradient of the entire training set which helps us to avoid local minima.