We can apply the gradient descent with Adam to the test problem.

First, we need a function that calculates the derivative for this function.

f(x) = x^2
f’(x) = x * 2
The derivative of x^2 is x * 2 in each dimension. The derivative() function implements this below.

# derivative of objective function

def derivative(x, y):
return asarray([x * 2.0, y * 2.0])
Next, we can implement gradient descent optimization.

First, we can select a random point in the bounds of the problem as a starting point for the search.

This assumes we have an array that defines the bounds of the search with one row for each dimension and the first column defines the minimum and the second column defines the maximum of the dimension.

# generate an initial point

x = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
score = objective(x[0], x[1])
Next, we need to initialize the first and second moments to zero.

# initialize first and second moments

m = [0.0 for _ in range(bounds.shape[0])]
v = [0.0 for _ in range(bounds.shape[0])]
We then run a fixed number of iterations of the algorithm defined by the “n_iter” hyperparameter.

# run iterations of gradient descent

for t in range(n_iter):

The first step is to calculate the gradient for the current solution using the derivative() function.

The first step is to calculate the derivative for the current set of parameters.

g = derivative(x[0], x[1])
Next, we need to perform the Adam update calculations. We will perform these calculations one variable at a time using an imperative programming style for readability.

In practice, I recommend using NumPy vector operations for efficiency.

# build a solution one variable at a time

for i in range(x.shape[0]):

First, we need to calculate the moment.

# m(t) = beta1 * m(t-1) + (1 - beta1) * g(t)

m[i] = beta1 * m[i] + (1.0 - beta1) * g[i]
Then the second moment.

# v(t) = beta2 * v(t-1) + (1 - beta2) * g(t)^2

v[i] = beta2 * v[i] + (1.0 - beta2) * g[i]**2
Then the bias correction for the first and second moments.

# mhat(t) = m(t) / (1 - beta1(t))

mhat = m[i] / (1.0 - beta1**(t+1))

# vhat(t) = v(t) / (1 - beta2(t))

vhat = v[i] / (1.0 - beta2**(t+1))
Then finally the updated variable value.

# x(t) = x(t-1) - alpha * mhat(t) / (sqrt(vhat(t)) + eps)

x[i] = x[i] - alpha * mhat / (sqrt(vhat) + eps)
This is then repeated for each parameter that is being optimized.

At the end of the iteration we can evaluate the new parameter values and report the performance of the search.

# evaluate candidate point

score = objective(x[0], x[1])

# report progress

print(’>%d f(%s) = %.5f’ % (t, x, score))
We can tie all of this together into a function named adam() that takes the names of the objective and derivative functions as well as the algorithm hyperparameters, and returns the best solution found at the end of the search and its evaluation.

This complete function is listed below.

def adam(objective, derivative, bounds, n_iter, alpha, beta1, beta2, eps=1e-8):
# generate an initial point
x = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
score = objective(x[0], x[1])
# initialize first and second moments
m = [0.0 for _ in range(bounds.shape[0])]
v = [0.0 for _ in range(bounds.shape[0])]
for t in range(n_iter):
g = derivative(x[0], x[1])
# build a solution one variable at a time
for i in range(x.shape[0]):
# m(t) = beta1 * m(t-1) + (1 - beta1) * g(t)
m[i] = beta1 * m[i] + (1.0 - beta1) * g[i]
# v(t) = beta2 * v(t-1) + (1 - beta2) * g(t)^2
v[i] = beta2 * v[i] + (1.0 - beta2) * g[i]2
# mhat(t) = m(t) / (1 - beta1(t))
mhat = m[i] / (1.0 - beta1
(t+1))
# vhat(t) = v(t) / (1 - beta2(t))
vhat = v[i] / (1.0 - beta2**(t+1))
# x(t) = x(t-1) - alpha * mhat(t) / (sqrt(vhat(t)) + eps)
x[i] = x[i] - alpha * mhat / (sqrt(vhat) + eps)
# evaluate candidate point
score = objective(x[0], x[1])
# report progress
print(’>%d f(%s) = %.5f’ % (t, x, score))
return [x, score]
Note: we have intentionally used lists and imperative coding style instead of vectorized operations for readability. Feel free to adapt the implementation to a vectorized implementation with NumPy arrays for better performance.

We can then define our hyperparameters and call the adam() function to optimize our test objective function.

In this case, we will use 60 iterations of the algorithm with an initial steps size of 0.02 and beta1 and beta2 values of 0.8 and 0.999 respectively. These hyperparameter values were found after a little trial and error.

seed(1)

# define range for input

bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])

n_iter = 60

alpha = 0.02

beta1 = 0.8

# factor for average squared gradient

beta2 = 0.999

best, score = adam(objective, derivative, bounds, n_iter, alpha, beta1, beta2)
print(‘Done!’)
print(‘f(%s) = %f’ % (best, score))
Tying all of this together, the complete example of gradient descent optimization with Adam is listed below.

from math import sqrt
from numpy import asarray
from numpy.random import rand
from numpy.random import seed

# objective function

def objective(x, y):
return x2.0 + y2.0

# derivative of objective function

def derivative(x, y):
return asarray([x * 2.0, y * 2.0])

def adam(objective, derivative, bounds, n_iter, alpha, beta1, beta2, eps=1e-8):
# generate an initial point
x = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
score = objective(x[0], x[1])
# initialize first and second moments
m = [0.0 for _ in range(bounds.shape[0])]
v = [0.0 for _ in range(bounds.shape[0])]
for t in range(n_iter):
g = derivative(x[0], x[1])
# build a solution one variable at a time
for i in range(x.shape[0]):
# m(t) = beta1 * m(t-1) + (1 - beta1) * g(t)
m[i] = beta1 * m[i] + (1.0 - beta1) * g[i]
# v(t) = beta2 * v(t-1) + (1 - beta2) * g(t)^2
v[i] = beta2 * v[i] + (1.0 - beta2) * g[i]2
# mhat(t) = m(t) / (1 - beta1(t))
mhat = m[i] / (1.0 - beta1
(t+1))
# vhat(t) = v(t) / (1 - beta2(t))
vhat = v[i] / (1.0 - beta2**(t+1))
# x(t) = x(t-1) - alpha * mhat(t) / (sqrt(vhat(t)) + eps)
x[i] = x[i] - alpha * mhat / (sqrt(vhat) + eps)
# evaluate candidate point
score = objective(x[0], x[1])
# report progress
print(’>%d f(%s) = %.5f’ % (t, x, score))
return [x, score]

seed(1)

# define range for input

bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])

n_iter = 60

alpha = 0.02