Univariate Function Optimization

swapneel-panda-419bc751 · 30 June 2021 15:36

We may need to find an optimal value of a function that takes a single parameter.

In machine learning, this may occur in many situations, such as:

Finding the coefficient of a model to fit to a training dataset.
Finding the value of a single hyperparameter that results in the best model performance.

This is called univariate function optimization.

We may be interested in the minimum outcome or maximum outcome of the function, although this can be simplified to minimization as a maximizing function can be made minimizing by adding a negative sign to all outcomes of the function.

There may or may not be limits on the inputs to the function, so-called unconstrained or constrained optimization, and we assume that small changes in input correspond to small changes in the output of the function, e.g. that it is smooth.

The function may or may not have a single optima, although we prefer that it does have a single optima and that shape of the function looks like a large basin. If this is the case, we know we can sample the function at one point and find the path down to the minima of the function. Technically, this is referred to as a convex function for minimization (concave for maximization), and functions that don’t have this basin shape are referred to as non-convex.

Convex Target Function: There is a single optima and the shape of the target function leads to this optima.

Nevertheless, the target function is sufficiently complex that we don’t know the derivative, meaning we cannot just use calculus to analytically compute the minimum or maximum of the function where the gradient is zero. This is referred to as a function that is non-differentiable.

Although we might be able to sample the function with candidate values, we don’t know the input that will result in the best outcome. This may be because of the many reasons it is expensive to evaluate candidate solutions.

Therefore, we require an algorithm that efficiently samples input values to the function.

One approach to solving univariate function optimization problems is to use Brent’s method.

Brent’s method is an optimization algorithm that combines a bisecting algorithm (Dekker’s method) and inverse quadratic interpolation. It can be used for constrained and unconstrained univariate function optimization.

The Brent-Dekker method is an extension of the bisection method. It is a root-finding algorithm that combines elements of the secant method and inverse quadratic interpolation. It has reliable and fast convergence properties, and it is the univariate optimization algorithm of choice in many popular numerical optimization packages.

— Pages 49-51, Algorithms for Optimization, 2019.

Bisecting algorithms use a bracket (lower and upper) of input values and split up the input domain, bisecting it in order to locate where in the domain the optima is located, much like a binary search. Dekker’s method is one way this is achieved efficiently for a continuous domain.

Dekker’s method gets stuck on non-convex problems. Brent’s method modifies Dekker’s method to avoid getting stuck and also approximates the second derivative of the objective function (called the Secant Method) in an effort to accelerate the search.

As such, Brent’s method for univariate function optimization is generally preferred over most other univariate function optimization algorithms given its efficiency.

Brent’s method is available in Python via the minimize_scalar() SciPy function that takes the name of the function to be minimized. If your target function is constrained to a range, it can be specified via the “bounds” argument.

It returns an OptimizeResult object that is a dictionary containing the solution. Importantly, the ‘x‘ key summarizes the input for the optima, the ‘fun‘ key summarizes the function output for the optima, and the ‘nfev‘ summarizes the number of evaluations of the target function that were performed.

minimize the function

result = minimize_scalar(objective, method=‘brent’)