In my last post I talked about how we iterate to approximate the Ordinary Least Squares solution. This is where Gradient Descent comes in. (It’s decent enough :p)
Gradient Descent is the numerical approximation of the OLS solution which intends to find the optimal solution to the loss function.
However, Gradient Descent can be applied to any optimization problem, be it ML or DL, as it arrives at the global minima by iteratively moving towards it. It starts at a random point (which can rather also be initialized with some specific values using certain initializers) and iteratively moves towards the direction where the loss decreases.
After every iteration it calculates the loss and updates the weights of the loss function. The magnitude of update is controlled by a parameter called the Learning Rate, alpha. Higher the value of alpha, bigger are the update steps and convergence might be quicker.