Convergence in Machine Learning

Convergence generally refers to the values of a process that have a tendency in behavior over time.

It is a useful idea when working with optimization algorithms.

Optimization refers to a type of problem that requires finding a set of inputs that result in the maximum or minimum value from an objective function. Optimization is an iterative process that produces a sequence of candidate solutions until ultimately arriving upon a final solution at the end of the process.

This behavior or dynamics of the optimization algorithm arriving on a stable-point final solution is referred to as convergence, e.g. the convergence of the optimization algorithms. In this way, convergence defines the termination of the optimization algorithm.

Local descent involves iteratively choosing a descent direction and then taking a step in that direction and repeating that process until convergence or some termination condition is met.

— Page 13, Algorithms for Optimization, 2019.

  • Convergence: Stop condition for an optimization algorithm where a stable point is located and further iterations of the algorithm are unlikely to result in further improvement.

We might measure and explore the convergence of an optimization algorithm empirically, such as using learning curves. Additionally, we might also explore the convergence of an optimization algorithm analytically, such as a convergence proof or average case computational complexity.

Strong selection pressure results in rapid, but possibly premature, convergence. Weakening the selection pressure slows down the search process …

— Page 78, Evolutionary Computation: A Unified Approach, 2002.

Optimization, and the convergence of optimization algorithms, is an important concept in machine learning for those algorithms that fit (learn) on a training dataset via an iterative optimization algorithm, such as logistic regression and artificial neural networks.

As such, we may choose optimization algorithms that result in better convergence behavior than other algorithms, or spend a lot of time tuning the convergence dynamics (learning dynamics) of an optimization algorithm via the hyperparameters of the optimization (e.g. learning rate).

Convergence behavior can be compared, often in terms of the number of iterations of an algorithm required until convergence, to the objective function evaluation of the stable point found at convergence, and combinations of these concerns.