BFGS Optimization Algorithm

BFGS is a second-order optimization algorithm.

It is an acronym, named for the four co-discovers of the algorithm: Broyden, Fletcher, Goldfarb, and Shanno.

It is a local search algorithm, intended for convex optimization problems with a single optima.

The BFGS algorithm is perhaps best understood as belonging to a group of algorithms that are an extension to Newton’s Method optimization algorithm, referred to as Quasi-Newton Methods.

Newton’s method is a second-order optimization algorithm that makes use of the Hessian matrix.

A limitation of Newton’s method is that it requires the calculation of the inverse of the Hessian matrix. This is a computationally expensive operation and may not be stable depending on the properties of the objective function.

Quasi-Newton methods are second-order optimization algorithms that approximate the inverse of the Hessian matrix using the gradient, meaning that the Hessian and its inverse do not need to be available or calculated precisely for each step of the algorithm. The main difference between different Quasi-Newton optimization algorithms is the specific way in which the approximation of the inverse Hessian is calculated.

The BFGS algorithm is one specific way for updating the calculation of the inverse Hessian, instead of recalculating it every iteration. It, or its extensions, may be one of the most popular Quasi-Newton or even second-order optimization algorithms used for numerical optimization. A benefit of using the Hessian, when available, is that it can be used to determine both the direction and the step size to move in order to change the input parameters to minimize (or maximize) the objective function.

Quasi-Newton methods like BFGS approximate the inverse Hessian, which can then be used to determine the direction to move, but we no longer have the step size.

The BFGS algorithm addresses this by using a line search in the chosen direction to determine how far to move in that direction.

For the derivation and calculations used by the BFGS algorithm, I recommend the resources in the further reading section at the end of this tutorial.

The size of the Hessian and its inverse is proportional to the number of input parameters to the objective function. As such, the size of the matrix can become very large for hundreds, thousand, or millions of parameters.

Limited memory BFGS (or L-BFGS) is an extension to the BFGS algorithm that addresses the cost of having a large number of parameters. It does this by not requiring that the entire approximation of the inverse matrix be stored, by assuming a simplification of the inverse Hessian in the previous iteration of the algorithm (used in the approximation).