Visualization of AMSGrad Optimization

swapneel-panda-419bc751 · 9 June 2021 15:44

We can plot the progress of the AMSGrad search on a contour plot of the domain.

This can provide an intuition for the progress of the search over the iterations of the algorithm.

We must update the amsgrad() function to maintain a list of all solutions found during the search, then return this list at the end of the search.

The updated version of the function with these changes is listed below.

gradient descent algorithm with amsgrad

def amsgrad(objective, derivative, bounds, n_iter, alpha, beta1, beta2):
solutions = list()
# generate an initial point
x = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
# initialize moment vectors
m = [0.0 for _ in range(bounds.shape[0])]
v = [0.0 for _ in range(bounds.shape[0])]
vhat = [0.0 for _ in range(bounds.shape[0])]
# run the gradient descent
for t in range(n_iter):
# calculate gradient g(t)
g = derivative(x[0], x[1])
# update variables one at a time
for i in range(x.shape[0]):
# m(t) = beta1(t) * m(t-1) + (1 - beta1(t)) * g(t)
m[i] = beta1**(t+1) * m[i] + (1.0 - beta1**(t+1)) * g[i]
# v(t) = beta2 * v(t-1) + (1 - beta2) * g(t)^2
v[i] = (beta2 * v[i]) + (1.0 - beta2) * g[i]**2
# vhat(t) = max(vhat(t-1), v(t))
vhat[i] = max(vhat[i], v[i])
# x(t) = x(t-1) - alpha(t) * m(t) / sqrt(vhat(t)))
x[i] = x[i] - alpha * m[i] / (sqrt(vhat[i]) + 1e-8)
# evaluate candidate point
score = objective(x[0], x[1])
# keep track of all solutions
solutions.append(x.copy())
# report progress
print(’>%d f(%s) = %.5f’ % (t, x, score))
return solutions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

gradient descent algorithm with amsgrad

def amsgrad(objective, derivative, bounds, n_iter, alpha, beta1, beta2):
solutions = list()
# generate an initial point
x = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
# initialize moment vectors
m = [0.0 for _ in range(bounds.shape[0])]
v = [0.0 for _ in range(bounds.shape[0])]
vhat = [0.0 for _ in range(bounds.shape[0])]
# run the gradient descent
for t in range(n_iter):
# calculate gradient g(t)
g = derivative(x[0], x[1])
# update variables one at a time
for i in range(x.shape[0]):
# m(t) = beta1(t) * m(t-1) + (1 - beta1(t)) * g(t)
m[i] = beta1**(t+1) * m[i] + (1.0 - beta1**(t+1)) * g[i]
# v(t) = beta2 * v(t-1) + (1 - beta2) * g(t)^2
v[i] = (beta2 * v[i]) + (1.0 - beta2) * g[i]**2
# vhat(t) = max(vhat(t-1), v(t))
vhat[i] = max(vhat[i], v[i])
# x(t) = x(t-1) - alpha(t) * m(t) / sqrt(vhat(t)))
x[i] = x[i] - alpha * m[i] / (sqrt(vhat[i]) + 1e-8)
# evaluate candidate point
score = objective(x[0], x[1])
# keep track of all solutions
solutions.append(x.copy())
# report progress
print(’>%d f(%s) = %.5f’ % (t, x, score))
return solutions
We can then execute the search as before, and this time retrieve the list of solutions instead of the best final solution.

…

seed the pseudo random number generator

seed(1)

define range for input

bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])

define the total iterations

n_iter = 100

steps size

alpha = 0.007

factor for average gradient

beta1 = 0.9

factor for average squared gradient

beta2 = 0.99

perform the gradient descent search with amsgrad

solutions = amsgrad(objective, derivative, bounds, n_iter, alpha, beta1, beta2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
…

seed the pseudo random number generator

seed(1)

define range for input

bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])

define the total iterations

n_iter = 100

steps size

alpha = 0.007

factor for average gradient

beta1 = 0.9

factor for average squared gradient

beta2 = 0.99

perform the gradient descent search with amsgrad

solutions = amsgrad(objective, derivative, bounds, n_iter, alpha, beta1, beta2)
We can then create a contour plot of the objective function, as before.

…

sample input range uniformly at 0.1 increments

xaxis = arange(bounds[0,0], bounds[0,1], 0.1)
yaxis = arange(bounds[1,0], bounds[1,1], 0.1)

create a mesh from the axis

x, y = meshgrid(xaxis, yaxis)

compute targets

results = objective(x, y)

create a filled contour plot with 50 levels and jet color scheme

pyplot.contourf(x, y, results, levels=50, cmap=‘jet’)
1
2
3
4
5
6
7
8
9
10
…

sample input range uniformly at 0.1 increments

xaxis = arange(bounds[0,0], bounds[0,1], 0.1)
yaxis = arange(bounds[1,0], bounds[1,1], 0.1)

create a mesh from the axis

x, y = meshgrid(xaxis, yaxis)

compute targets

results = objective(x, y)

create a filled contour plot with 50 levels and jet color scheme

pyplot.contourf(x, y, results, levels=50, cmap=‘jet’)
Finally, we can plot each solution found during the search as a white dot connected by a line.

…

plot the sample as black circles

solutions = asarray(solutions)
pyplot.plot(solutions[:, 0], solutions[:, 1], ‘.-’, color=‘w’)
1
2
3
4
…

plot the sample as black circles

solutions = asarray(solutions)
pyplot.plot(solutions[:, 0], solutions[:, 1], ‘.-’, color=‘w’)
Tying this all together, the complete example of performing the AMSGrad optimization on the test problem and plotting the results on a contour plot is listed below.

example of plotting the amsgrad search on a contour plot of the test function

from math import sqrt
from numpy import asarray
from numpy import arange
from numpy.random import rand
from numpy.random import seed
from numpy import meshgrid
from matplotlib import pyplot
from mpl_toolkits.mplot3d import Axes3D

objective function

def objective(x, y):
return x2.0 + y2.0

derivative of objective function

def derivative(x, y):
return asarray([x * 2.0, y * 2.0])

gradient descent algorithm with amsgrad

def amsgrad(objective, derivative, bounds, n_iter, alpha, beta1, beta2):
solutions = list()
# generate an initial point
x = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
# initialize moment vectors
m = [0.0 for _ in range(bounds.shape[0])]
v = [0.0 for _ in range(bounds.shape[0])]
vhat = [0.0 for _ in range(bounds.shape[0])]
# run the gradient descent
for t in range(n_iter):
# calculate gradient g(t)
g = derivative(x[0], x[1])
# update variables one at a time
for i in range(x.shape[0]):
# m(t) = beta1(t) * m(t-1) + (1 - beta1(t)) * g(t)
m[i] = beta1**(t+1) * m[i] + (1.0 - beta1**(t+1)) * g[i]
# v(t) = beta2 * v(t-1) + (1 - beta2) * g(t)^2
v[i] = (beta2 * v[i]) + (1.0 - beta2) * g[i]**2
# vhat(t) = max(vhat(t-1), v(t))
vhat[i] = max(vhat[i], v[i])
# x(t) = x(t-1) - alpha(t) * m(t) / sqrt(vhat(t)))
x[i] = x[i] - alpha * m[i] / (sqrt(vhat[i]) + 1e-8)
# evaluate candidate point
score = objective(x[0], x[1])
# keep track of all solutions
solutions.append(x.copy())
# report progress
print(’>%d f(%s) = %.5f’ % (t, x, score))
return solutions

seed the pseudo random number generator

seed(1)

define range for input

bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])

define the total iterations

n_iter = 100

steps size

alpha = 0.007

factor for average gradient

beta1 = 0.9

factor for average squared gradient

beta2 = 0.99

perform the gradient descent search with amsgrad

solutions = amsgrad(objective, derivative, bounds, n_iter, alpha, beta1, beta2)

sample input range uniformly at 0.1 increments

xaxis = arange(bounds[0,0], bounds[0,1], 0.1)
yaxis = arange(bounds[1,0], bounds[1,1], 0.1)

create a mesh from the axis

x, y = meshgrid(xaxis, yaxis)

compute targets

results = objective(x, y)

create a filled contour plot with 50 levels and jet color scheme

pyplot.contourf(x, y, results, levels=50, cmap=‘jet’)

plot the sample as black circles

solutions = asarray(solutions)
pyplot.plot(solutions[:, 0], solutions[:, 1], ‘.-’, color=‘w’)

show the plot

pyplot.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77

example of plotting the amsgrad search on a contour plot of the test function

from math import sqrt
from numpy import asarray
from numpy import arange
from numpy.random import rand
from numpy.random import seed
from numpy import meshgrid
from matplotlib import pyplot
from mpl_toolkits.mplot3d import Axes3D

objective function

def objective(x, y):
return x2.0 + y2.0

derivative of objective function

def derivative(x, y):
return asarray([x * 2.0, y * 2.0])

gradient descent algorithm with amsgrad

def amsgrad(objective, derivative, bounds, n_iter, alpha, beta1, beta2):
solutions = list()
# generate an initial point
x = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
# initialize moment vectors
m = [0.0 for _ in range(bounds.shape[0])]
v = [0.0 for _ in range(bounds.shape[0])]
vhat = [0.0 for _ in range(bounds.shape[0])]
# run the gradient descent
for t in range(n_iter):
# calculate gradient g(t)
g = derivative(x[0], x[1])
# update variables one at a time
for i in range(x.shape[0]):
# m(t) = beta1(t) * m(t-1) + (1 - beta1(t)) * g(t)
m[i] = beta1**(t+1) * m[i] + (1.0 - beta1**(t+1)) * g[i]
# v(t) = beta2 * v(t-1) + (1 - beta2) * g(t)^2
v[i] = (beta2 * v[i]) + (1.0 - beta2) * g[i]**2
# vhat(t) = max(vhat(t-1), v(t))
vhat[i] = max(vhat[i], v[i])
# x(t) = x(t-1) - alpha(t) * m(t) / sqrt(vhat(t)))
x[i] = x[i] - alpha * m[i] / (sqrt(vhat[i]) + 1e-8)
# evaluate candidate point
score = objective(x[0], x[1])
# keep track of all solutions
solutions.append(x.copy())
# report progress
print(’>%d f(%s) = %.5f’ % (t, x, score))
return solutions

seed the pseudo random number generator

seed(1)

define range for input

bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])

define the total iterations

n_iter = 100

steps size

alpha = 0.007

factor for average gradient

beta1 = 0.9

factor for average squared gradient

beta2 = 0.99

perform the gradient descent search with amsgrad

solutions = amsgrad(objective, derivative, bounds, n_iter, alpha, beta1, beta2)

sample input range uniformly at 0.1 increments

xaxis = arange(bounds[0,0], bounds[0,1], 0.1)
yaxis = arange(bounds[1,0], bounds[1,1], 0.1)

create a mesh from the axis

x, y = meshgrid(xaxis, yaxis)

compute targets

results = objective(x, y)

create a filled contour plot with 50 levels and jet color scheme

pyplot.contourf(x, y, results, levels=50, cmap=‘jet’)

plot the sample as black circles

solutions = asarray(solutions)
pyplot.plot(solutions[:, 0], solutions[:, 1], ‘.-’, color=‘w’)

show the plot

pyplot.show()
Running the example performs the search as before, except in this case, the contour plot of the objective function is created.