Gradient Descent Optimization With Momentum

swapneel-panda-419bc751 · 28 June 2021 11:21

This can be achieved by updating the gradient_descent() function to take a “momentum” argument that defines the amount of momentum used during the search.

The change made to the solution must be remembered from the previous iteration of the loop, with an initial value of 0.0.

…

keep track of the change

change = 0.0

…

keep track of the change

change = 0.0
We can then break the update procedure down into first calculating the gradient, then calculating the change to the solution, calculating the position of the new solution, then saving the change for the next iteration.

…

calculate gradient

gradient = derivative(solution)

calculate update

new_change = step_size * gradient + momentum * change

take a step

solution = solution - new_change

save the change

change = new_change

…

calculate gradient

gradient = derivative(solution)

calculate update

new_change = step_size * gradient + momentum * change

take a step

solution = solution - new_change

save the change

change = new_change
The updated version of the gradient_descent() function with these changes is listed below.

gradient descent algorithm

def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):
# generate an initial point
solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
# keep track of the change
change = 0.0
# run the gradient descent
for i in range(n_iter):
# calculate gradient
gradient = derivative(solution)
# calculate update
new_change = step_size * gradient + momentum * change
# take a step
solution = solution - new_change
# save the change
change = new_change
# evaluate candidate point
solution_eval = objective(solution)
# report progress
print(’>%d f(%s) = %.5f’ % (i, solution, solution_eval))
return [solution, solution_eval]

gradient descent algorithm

def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):
# generate an initial point
solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
# keep track of the change
change = 0.0
# run the gradient descent
for i in range(n_iter):
# calculate gradient
gradient = derivative(solution)
# calculate update
new_change = step_size * gradient + momentum * change
# take a step
solution = solution - new_change
# save the change
change = new_change
# evaluate candidate point
solution_eval = objective(solution)
# report progress
print(’>%d f(%s) = %.5f’ % (i, solution, solution_eval))
return [solution, solution_eval]
We can then choose a momentum value and pass it to the gradient_descent() function.

After a little trial and error, a momentum value of 0.3 was found to be effective on this problem, given the fixed step size of 0.1.

…

define momentum

momentum = 0.3

perform the gradient descent search with momentum

best, score = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)

…

define momentum

momentum = 0.3

perform the gradient descent search with momentum

best, score = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
Tying this together, the complete example of gradient descent optimization with momentum is listed below.

example of gradient descent with momentum for a one-dimensional function

from numpy import asarray
from numpy.random import rand
from numpy.random import seed

objective function

def objective(x):
return x**2.0

derivative of objective function

def derivative(x):
return x * 2.0

gradient descent algorithm

def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):
# generate an initial point
solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
# keep track of the change
change = 0.0
# run the gradient descent
for i in range(n_iter):
# calculate gradient
gradient = derivative(solution)
# calculate update
new_change = step_size * gradient + momentum * change
# take a step
solution = solution - new_change
# save the change
change = new_change
# evaluate candidate point
solution_eval = objective(solution)
# report progress
print(’>%d f(%s) = %.5f’ % (i, solution, solution_eval))
return [solution, solution_eval]

seed the pseudo random number generator

seed(4)

define range for input

bounds = asarray([[-1.0, 1.0]])

define the total iterations

n_iter = 30

define the step size

step_size = 0.1

define momentum

momentum = 0.3

perform the gradient descent search with momentum

best, score = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
print(‘Done!’)
print(‘f(%s) = %f’ % (best, score))

example of gradient descent with momentum for a one-dimensional function

from numpy import asarray
from numpy.random import rand
from numpy.random import seed

objective function

def objective(x):
return x**2.0

derivative of objective function

def derivative(x):
return x * 2.0

gradient descent algorithm

def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):
# generate an initial point
solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
# keep track of the change
change = 0.0
# run the gradient descent
for i in range(n_iter):
# calculate gradient
gradient = derivative(solution)
# calculate update
new_change = step_size * gradient + momentum * change
# take a step
solution = solution - new_change
# save the change
change = new_change
# evaluate candidate point
solution_eval = objective(solution)
# report progress
print(’>%d f(%s) = %.5f’ % (i, solution, solution_eval))
return [solution, solution_eval]

seed the pseudo random number generator

seed(4)

define range for input

bounds = asarray([[-1.0, 1.0]])

define the total iterations

n_iter = 30

define the step size

step_size = 0.1

define momentum

momentum = 0.3

perform the gradient descent search with momentum

best, score = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
print(‘Done!’)
print(‘f(%s) = %f’ % (best, score))
Running the example starts with a random point in the search space, then applies the gradient descent algorithm with momentum, reporting performance along the way.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the algorithm finds a good solution after about 13 iterations, with a function evaluation of about 0.0.

As expected, this is faster (fewer iterations) than gradient descent without momentum, using the same starting point and step size that took 27 iterations.

0 f([0.74724774]) = 0.55838
1 f([0.54175461]) = 0.29350
2 f([0.37175575]) = 0.13820
3 f([0.24640494]) = 0.06072
4 f([0.15951871]) = 0.02545
5 f([0.1015491]) = 0.01031
6 f([0.0638484]) = 0.00408
7 f([0.03976851]) = 0.00158
8 f([0.02459084]) = 0.00060
9 f([0.01511937]) = 0.00023
10 f([0.00925406]) = 0.00009
11 f([0.00564365]) = 0.00003
12 f([0.0034318]) = 0.00001
13 f([0.00208188]) = 0.00000
14 f([0.00126053]) = 0.00000
15 f([0.00076202]) = 0.00000
16 f([0.00046006]) = 0.00000
17 f([0.00027746]) = 0.00000
18 f([0.00016719]) = 0.00000
19 f([0.00010067]) = 0.00000
20 f([6.05804744e-05]) = 0.00000
21 f([3.64373635e-05]) = 0.00000
22 f([2.19069576e-05]) = 0.00000
23 f([1.31664443e-05]) = 0.00000
24 f([7.91100141e-06]) = 0.00000
25 f([4.75216828e-06]) = 0.00000
26 f([2.85408468e-06]) = 0.00000
27 f([1.71384267e-06]) = 0.00000
28 f([1.02900153e-06]) = 0.00000
29 f([6.17748881e-07]) = 0.00000
Done!
f([6.17748881e-07]) = 0.000000