Second-Order Optimization Algorithms

Optimization involves finding values for input parameters that maximize or minimize an objective function.

Newton-method optimization algorithms are those algorithms that make use of the second derivative of the objective function.

You may recall from calculus that the first derivative of a function is the rate of change or curvature of the function at a specific point. The derivative can be followed downhill (or uphill) by an optimization algorithm toward the minima of the function (the input values that result in the smallest output of the objective function).

Algorithms that make use of the first derivative are called first-order optimization algorithms. An example of a first-order algorithm is the gradient descent optimization algorithm.

  • First-Order Methods : Optimization algorithms that make use of the first-order derivative to find the optima of an objective function.

The second-order derivative is the derivative of the derivative, or the rate of change of the rate of change.

The second derivative can be followed to more efficiently locate the optima of the objective function. This makes sense more generally, as the more information we have about the objective function, the easier it may be to optimize it.

The second-order derivative allows us to know both which direction to move (like the first-order) but also estimate how far to move in that direction, called the step size. Algorithms that make use of the second-order derivative are referred to as second-order optimization algorithms.

  • Second-Order Methods : Optimization algorithms that make use of the second-order derivative to find the optima of an objective function.

An example of a second-order optimization algorithm is Newton’s method.

When an objective function has more than one input variable, the input variables together may be thought of as a vector, which may be familiar from linear algebra. Similarly, the first derivative of multiple input variables may also be a vector, where each element is called a partial derivative. This vector of partial derivatives is referred to as the gradient.

  • Gradient : Vector of partial first derivatives for multiple input variables of an objective function.

This idea generalizes to the second-order derivatives of the multivariate inputs, which is a matrix containing the second derivatives called the Hessian matrix.

  • Hessian : Matrix of partial second-order derivatives for multiple input variables of an objective function.

The Hessian matrix is square and symmetric if the second derivatives are all continuous at the point where we are calculating the derivatives. This is often the case when solving real-valued optimization problems and an expectation when using many second-order methods. As such, it is common to describe second-order optimization algorithms making use of or following the Hessian to the optima of the objective function.