This can be achieved by updating the gradient_descent() function to take a “momentum” argument that defines the amount of momentum used during the search.
The change made to the solution must be remembered from the previous iteration of the loop, with an initial value of 0.0.
…
keep track of the change
change = 0.0
…
keep track of the change
change = 0.0
We can then break the update procedure down into first calculating the gradient, then calculating the change to the solution, calculating the position of the new solution, then saving the change for the next iteration.
…
calculate gradient
gradient = derivative(solution)
calculate update
new_change = step_size * gradient + momentum * change
take a step
solution = solution - new_change
save the change
change = new_change
…
calculate gradient
gradient = derivative(solution)
calculate update
new_change = step_size * gradient + momentum * change
take a step
solution = solution - new_change
save the change
change = new_change
The updated version of the gradient_descent() function with these changes is listed below.
gradient descent algorithm
def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):
# generate an initial point
solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
# keep track of the change
change = 0.0
# run the gradient descent
for i in range(n_iter):
# calculate gradient
gradient = derivative(solution)
# calculate update
new_change = step_size * gradient + momentum * change
# take a step
solution = solution - new_change
# save the change
change = new_change
# evaluate candidate point
solution_eval = objective(solution)
# report progress
print(’>%d f(%s) = %.5f’ % (i, solution, solution_eval))
return [solution, solution_eval]
gradient descent algorithm
def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):
# generate an initial point
solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
# keep track of the change
change = 0.0
# run the gradient descent
for i in range(n_iter):
# calculate gradient
gradient = derivative(solution)
# calculate update
new_change = step_size * gradient + momentum * change
# take a step
solution = solution - new_change
# save the change
change = new_change
# evaluate candidate point
solution_eval = objective(solution)
# report progress
print(’>%d f(%s) = %.5f’ % (i, solution, solution_eval))
return [solution, solution_eval]
We can then choose a momentum value and pass it to the gradient_descent() function.
After a little trial and error, a momentum value of 0.3 was found to be effective on this problem, given the fixed step size of 0.1.
…
define momentum
momentum = 0.3
perform the gradient descent search with momentum
best, score = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
…
define momentum
momentum = 0.3
perform the gradient descent search with momentum
best, score = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
Tying this together, the complete example of gradient descent optimization with momentum is listed below.
example of gradient descent with momentum for a one-dimensional function
from numpy import asarray
from numpy.random import rand
from numpy.random import seed
objective function
def objective(x):
return x**2.0
derivative of objective function
def derivative(x):
return x * 2.0
gradient descent algorithm
def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):
# generate an initial point
solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
# keep track of the change
change = 0.0
# run the gradient descent
for i in range(n_iter):
# calculate gradient
gradient = derivative(solution)
# calculate update
new_change = step_size * gradient + momentum * change
# take a step
solution = solution - new_change
# save the change
change = new_change
# evaluate candidate point
solution_eval = objective(solution)
# report progress
print(’>%d f(%s) = %.5f’ % (i, solution, solution_eval))
return [solution, solution_eval]
seed the pseudo random number generator
seed(4)
define range for input
bounds = asarray([[-1.0, 1.0]])
define the total iterations
n_iter = 30
define the step size
step_size = 0.1
define momentum
momentum = 0.3
perform the gradient descent search with momentum
best, score = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
print(‘Done!’)
print(‘f(%s) = %f’ % (best, score))
example of gradient descent with momentum for a one-dimensional function
from numpy import asarray
from numpy.random import rand
from numpy.random import seed
objective function
def objective(x):
return x**2.0
derivative of objective function
def derivative(x):
return x * 2.0
gradient descent algorithm
def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):
# generate an initial point
solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
# keep track of the change
change = 0.0
# run the gradient descent
for i in range(n_iter):
# calculate gradient
gradient = derivative(solution)
# calculate update
new_change = step_size * gradient + momentum * change
# take a step
solution = solution - new_change
# save the change
change = new_change
# evaluate candidate point
solution_eval = objective(solution)
# report progress
print(’>%d f(%s) = %.5f’ % (i, solution, solution_eval))
return [solution, solution_eval]
seed the pseudo random number generator
seed(4)
define range for input
bounds = asarray([[-1.0, 1.0]])
define the total iterations
n_iter = 30
define the step size
step_size = 0.1
define momentum
momentum = 0.3
perform the gradient descent search with momentum
best, score = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
print(‘Done!’)
print(‘f(%s) = %f’ % (best, score))
Running the example starts with a random point in the search space, then applies the gradient descent algorithm with momentum, reporting performance along the way.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
In this case, we can see that the algorithm finds a good solution after about 13 iterations, with a function evaluation of about 0.0.
As expected, this is faster (fewer iterations) than gradient descent without momentum, using the same starting point and step size that took 27 iterations.
0 f([0.74724774]) = 0.55838
1 f([0.54175461]) = 0.29350
2 f([0.37175575]) = 0.13820
3 f([0.24640494]) = 0.06072
4 f([0.15951871]) = 0.02545
5 f([0.1015491]) = 0.01031
6 f([0.0638484]) = 0.00408
7 f([0.03976851]) = 0.00158
8 f([0.02459084]) = 0.00060
9 f([0.01511937]) = 0.00023
10 f([0.00925406]) = 0.00009
11 f([0.00564365]) = 0.00003
12 f([0.0034318]) = 0.00001
13 f([0.00208188]) = 0.00000
14 f([0.00126053]) = 0.00000
15 f([0.00076202]) = 0.00000
16 f([0.00046006]) = 0.00000
17 f([0.00027746]) = 0.00000
18 f([0.00016719]) = 0.00000
19 f([0.00010067]) = 0.00000
20 f([6.05804744e-05]) = 0.00000
21 f([3.64373635e-05]) = 0.00000
22 f([2.19069576e-05]) = 0.00000
23 f([1.31664443e-05]) = 0.00000
24 f([7.91100141e-06]) = 0.00000
25 f([4.75216828e-06]) = 0.00000
26 f([2.85408468e-06]) = 0.00000
27 f([1.71384267e-06]) = 0.00000
28 f([1.02900153e-06]) = 0.00000
29 f([6.17748881e-07]) = 0.00000
Done!
f([6.17748881e-07]) = 0.000000
0 f([0.74724774]) = 0.55838
1 f([0.54175461]) = 0.29350
2 f([0.37175575]) = 0.13820
3 f([0.24640494]) = 0.06072
4 f([0.15951871]) = 0.02545
5 f([0.1015491]) = 0.01031
6 f([0.0638484]) = 0.00408
7 f([0.03976851]) = 0.00158
8 f([0.02459084]) = 0.00060
9 f([0.01511937]) = 0.00023
10 f([0.00925406]) = 0.00009
11 f([0.00564365]) = 0.00003
12 f([0.0034318]) = 0.00001
13 f([0.00208188]) = 0.00000
14 f([0.00126053]) = 0.00000
15 f([0.00076202]) = 0.00000
16 f([0.00046006]) = 0.00000
17 f([0.00027746]) = 0.00000
18 f([0.00016719]) = 0.00000
19 f([0.00010067]) = 0.00000
20 f([6.05804744e-05]) = 0.00000
21 f([3.64373635e-05]) = 0.00000
22 f([2.19069576e-05]) = 0.00000
23 f([1.31664443e-05]) = 0.00000
24 f([7.91100141e-06]) = 0.00000
25 f([4.75216828e-06]) = 0.00000
26 f([2.85408468e-06]) = 0.00000
27 f([1.71384267e-06]) = 0.00000
28 f([1.02900153e-06]) = 0.00000
29 f([6.17748881e-07]) = 0.00000
Done!
f([6.17748881e-07]) = 0.000000