We can plot the progress of the AdaMax search on a contour plot of the domain.
This can provide an intuition for the progress of the search over the iterations of the algorithm.
We must update the adamax() function to maintain a list of all solutions found during the search, then return this list at the end of the search.
The updated version of the function with these changes is listed below.
gradient descent algorithm with adamax
def adamax(objective, derivative, bounds, n_iter, alpha, beta1, beta2):
solutions = list()
# generate an initial point
x = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
# initialize moment vector and weighted infinity norm
m = [0.0 for _ in range(bounds.shape[0])]
u = [0.0 for _ in range(bounds.shape[0])]
# run iterations of gradient descent
for t in range(n_iter):
# calculate gradient g(t)
g = derivative(x[0], x[1])
# build a solution one variable at a time
for i in range(x.shape[0]):
# m(t) = beta1 * m(t-1) + (1 - beta1) * g(t)
m[i] = beta1 * m[i] + (1.0 - beta1) * g[i]
# u(t) = max(beta2 * u(t-1), abs(g(t)))
u[i] = max(beta2 * u[i], abs(g[i]))
# step_size(t) = alpha / (1 - beta1(t))
step_size = alpha / (1.0 - beta1**(t+1))
# delta(t) = m(t) / u(t)
delta = m[i] / u[i]
# x(t) = x(t-1) - step_size(t) * delta(t)
x[i] = x[i] - step_size * delta
# evaluate candidate point
score = objective(x[0], x[1])
solutions.append(x.copy())
# report progress
print(’>%d f(%s) = %.5f’ % (t, x, score))
return solutions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
gradient descent algorithm with adamax
def adamax(objective, derivative, bounds, n_iter, alpha, beta1, beta2):
solutions = list()
# generate an initial point
x = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
# initialize moment vector and weighted infinity norm
m = [0.0 for _ in range(bounds.shape[0])]
u = [0.0 for _ in range(bounds.shape[0])]
# run iterations of gradient descent
for t in range(n_iter):
# calculate gradient g(t)
g = derivative(x[0], x[1])
# build a solution one variable at a time
for i in range(x.shape[0]):
# m(t) = beta1 * m(t-1) + (1 - beta1) * g(t)
m[i] = beta1 * m[i] + (1.0 - beta1) * g[i]
# u(t) = max(beta2 * u(t-1), abs(g(t)))
u[i] = max(beta2 * u[i], abs(g[i]))
# step_size(t) = alpha / (1 - beta1(t))
step_size = alpha / (1.0 - beta1**(t+1))
# delta(t) = m(t) / u(t)
delta = m[i] / u[i]
# x(t) = x(t-1) - step_size(t) * delta(t)
x[i] = x[i] - step_size * delta
# evaluate candidate point
score = objective(x[0], x[1])
solutions.append(x.copy())
# report progress
print(’>%d f(%s) = %.5f’ % (t, x, score))
return solutions
We can then execute the search as before, and this time retrieve the list of solutions instead of the best final solution.
…
seed the pseudo random number generator
seed(1)
define range for input
bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])
define the total iterations
n_iter = 60
steps size
alpha = 0.02
factor for average gradient
beta1 = 0.8
factor for average squared gradient
beta2 = 0.99
perform the gradient descent search with adamax
solutions = adamax(objective, derivative, bounds, n_iter, alpha, beta1, beta2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
…
seed the pseudo random number generator
seed(1)
define range for input
bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])
define the total iterations
n_iter = 60
steps size
alpha = 0.02
factor for average gradient
beta1 = 0.8
factor for average squared gradient
beta2 = 0.99
perform the gradient descent search with adamax
solutions = adamax(objective, derivative, bounds, n_iter, alpha, beta1, beta2)
We can then create a contour plot of the objective function, as before.
…
sample input range uniformly at 0.1 increments
xaxis = arange(bounds[0,0], bounds[0,1], 0.1)
yaxis = arange(bounds[1,0], bounds[1,1], 0.1)
create a mesh from the axis
x, y = meshgrid(xaxis, yaxis)
compute targets
results = objective(x, y)
create a filled contour plot with 50 levels and jet color scheme
pyplot.contourf(x, y, results, levels=50, cmap=‘jet’)
1
2
3
4
5
6
7
8
9
10
…
sample input range uniformly at 0.1 increments
xaxis = arange(bounds[0,0], bounds[0,1], 0.1)
yaxis = arange(bounds[1,0], bounds[1,1], 0.1)
create a mesh from the axis
x, y = meshgrid(xaxis, yaxis)
compute targets
results = objective(x, y)
create a filled contour plot with 50 levels and jet color scheme
pyplot.contourf(x, y, results, levels=50, cmap=‘jet’)
Finally, we can plot each solution found during the search as a white dot connected by a line.
…
plot the sample as black circles
solutions = asarray(solutions)
pyplot.plot(solutions[:, 0], solutions[:, 1], ‘.-’, color=‘w’)
1
2
3
4
…
plot the sample as black circles
solutions = asarray(solutions)
pyplot.plot(solutions[:, 0], solutions[:, 1], ‘.-’, color=‘w’)
Tying this all together, the complete example of performing the AdaMax optimization on the test problem and plotting the results on a contour plot is listed below.
example of plotting the adamax search on a contour plot of the test function
from numpy import asarray
from numpy import arange
from numpy.random import rand
from numpy.random import seed
from numpy import meshgrid
from matplotlib import pyplot
from mpl_toolkits.mplot3d import Axes3D
objective function
def objective(x, y):
return x2.0 + y2.0
derivative of objective function
def derivative(x, y):
return asarray([x * 2.0, y * 2.0])
gradient descent algorithm with adamax
def adamax(objective, derivative, bounds, n_iter, alpha, beta1, beta2):
solutions = list()
# generate an initial point
x = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
# initialize moment vector and weighted infinity norm
m = [0.0 for _ in range(bounds.shape[0])]
u = [0.0 for _ in range(bounds.shape[0])]
# run iterations of gradient descent
for t in range(n_iter):
# calculate gradient g(t)
g = derivative(x[0], x[1])
# build a solution one variable at a time
for i in range(x.shape[0]):
# m(t) = beta1 * m(t-1) + (1 - beta1) * g(t)
m[i] = beta1 * m[i] + (1.0 - beta1) * g[i]
# u(t) = max(beta2 * u(t-1), abs(g(t)))
u[i] = max(beta2 * u[i], abs(g[i]))
# step_size(t) = alpha / (1 - beta1(t))
step_size = alpha / (1.0 - beta1**(t+1))
# delta(t) = m(t) / u(t)
delta = m[i] / u[i]
# x(t) = x(t-1) - step_size(t) * delta(t)
x[i] = x[i] - step_size * delta
# evaluate candidate point
score = objective(x[0], x[1])
solutions.append(x.copy())
# report progress
print(’>%d f(%s) = %.5f’ % (t, x, score))
return solutions
seed the pseudo random number generator
seed(1)
define range for input
bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])
define the total iterations
n_iter = 60
steps size
alpha = 0.02
factor for average gradient
beta1 = 0.8
factor for average squared gradient
beta2 = 0.99
perform the gradient descent search with adamax
solutions = adamax(objective, derivative, bounds, n_iter, alpha, beta1, beta2)
sample input range uniformly at 0.1 increments
xaxis = arange(bounds[0,0], bounds[0,1], 0.1)
yaxis = arange(bounds[1,0], bounds[1,1], 0.1)
create a mesh from the axis
x, y = meshgrid(xaxis, yaxis)
compute targets
results = objective(x, y)
create a filled contour plot with 50 levels and jet color scheme
pyplot.contourf(x, y, results, levels=50, cmap=‘jet’)
plot the sample as black circles
solutions = asarray(solutions)
pyplot.plot(solutions[:, 0], solutions[:, 1], ‘.-’, color=‘w’)
show the plot
pyplot.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
example of plotting the adamax search on a contour plot of the test function
from numpy import asarray
from numpy import arange
from numpy.random import rand
from numpy.random import seed
from numpy import meshgrid
from matplotlib import pyplot
from mpl_toolkits.mplot3d import Axes3D
objective function
def objective(x, y):
return x2.0 + y2.0
derivative of objective function
def derivative(x, y):
return asarray([x * 2.0, y * 2.0])
gradient descent algorithm with adamax
def adamax(objective, derivative, bounds, n_iter, alpha, beta1, beta2):
solutions = list()
# generate an initial point
x = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
# initialize moment vector and weighted infinity norm
m = [0.0 for _ in range(bounds.shape[0])]
u = [0.0 for _ in range(bounds.shape[0])]
# run iterations of gradient descent
for t in range(n_iter):
# calculate gradient g(t)
g = derivative(x[0], x[1])
# build a solution one variable at a time
for i in range(x.shape[0]):
# m(t) = beta1 * m(t-1) + (1 - beta1) * g(t)
m[i] = beta1 * m[i] + (1.0 - beta1) * g[i]
# u(t) = max(beta2 * u(t-1), abs(g(t)))
u[i] = max(beta2 * u[i], abs(g[i]))
# step_size(t) = alpha / (1 - beta1(t))
step_size = alpha / (1.0 - beta1**(t+1))
# delta(t) = m(t) / u(t)
delta = m[i] / u[i]
# x(t) = x(t-1) - step_size(t) * delta(t)
x[i] = x[i] - step_size * delta
# evaluate candidate point
score = objective(x[0], x[1])
solutions.append(x.copy())
# report progress
print(’>%d f(%s) = %.5f’ % (t, x, score))
return solutions
seed the pseudo random number generator
seed(1)
define range for input
bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])
define the total iterations
n_iter = 60
steps size
alpha = 0.02
factor for average gradient
beta1 = 0.8
factor for average squared gradient
beta2 = 0.99
perform the gradient descent search with adamax
solutions = adamax(objective, derivative, bounds, n_iter, alpha, beta1, beta2)
sample input range uniformly at 0.1 increments
xaxis = arange(bounds[0,0], bounds[0,1], 0.1)
yaxis = arange(bounds[1,0], bounds[1,1], 0.1)
create a mesh from the axis
x, y = meshgrid(xaxis, yaxis)
compute targets
results = objective(x, y)
create a filled contour plot with 50 levels and jet color scheme
pyplot.contourf(x, y, results, levels=50, cmap=‘jet’)
plot the sample as black circles
solutions = asarray(solutions)
pyplot.plot(solutions[:, 0], solutions[:, 1], ‘.-’, color=‘w’)
show the plot
pyplot.show()
Running the example performs the search as before, except in this case, the contour plot of the objective function is created.