A gradient is a vector-valued function that represents the slope of the tangent of the graph of the function, pointing the direction of the greatest rate of increase of the function. It is a derivative that indicates the incline or the slope of the cost function.
imagine you are in the top of a mountain. Your goal is to reach the bottom field, but there is a problem: you are blind. How can you come with a solution? Well, you will have to take small steps around and move towards the direction of the higher incline. You do this iteratively, moving one step at a time until finally reach the bottom of the mountain.
That is exactly what Gradient Descent does. Its goal is to reach the lowest point of the mountain. The mountain is the data plotted in a space, the size of the step you move is the learning rate, feeling the incline around you and decide which is higher is calculating the gradient of a set of parameter values, which is done iteratively. The chosen direction is where the cost function reduces (the opposite direction of the gradient). The lowest point in the mountain is the value -or weights- where the cost of the function reached its minimum (the parameters where our model presents more accuracy).

Itâ€™s a method for optimizing continuous functions.

It is an iterative method in which one begins at an arbitrary point in the domain, computes the direction of the gradient at that point, and then increments the test point in that direction (or opposite that direction depending on whether you are maximizing or minimizing the function). Repeat until sufficiently close to a local optimum.

Different methods exist for deciding how large of a step to take. A straightforward approach is to do a line search for a local optimum along the step direction and step directly to that local optimum. Another method entails using a fixed sequence of step sizes that decay exponentially.

Gradient descent is famously used to improve machine learning models such as neural networks.