Optimization for Machine Learning

Optimization is a field of mathematics concerned with finding a good or best solution among many candidates.

It is an important foundational topic required in machine learning as most machine learning algorithms are fit on historical data using an optimization algorithm. Additionally, broader problems, such as model selection and hyperparameter tuning, can also be framed as an optimization problem.

Although having some background in optimization is critical for machine learning practitioners, it can be a daunting topic given that it is often described using highly mathematical language.

The field of optimization is enormous as it touches many other fields of study.

As such, there are hundreds of books on the topic, and most are textbooks filed with math and proofs. This is fair enough given that it is a highly mathematical subject.

Nevertheless, there are books that provide a more approachable description of optimization algorithms.

Not all optimization algorithms are relevant to machine learning; instead, it is useful to focus on a small subset of algorithms.

Frankly, it is hard to group optimization algorithms as there are many concerns. Nevertheless, it is important to have some idea of the optimization that underlies simpler algorithms, such as linear regression and logistic regression (e.g. convex optimization, least squares, newton methods, etc.), and neural networks (first-order methods, gradient descent, etc.).

These are foundational optimization algorithms covered in most optimization textbooks.

Not all optimization problems in machine learning are well behaved, such as optimization used in AutoML and hyperparameter tuning. Therefore, knowledge of stochastic optimization algorithms is required (simulated annealing, genetic algorithms, particle swarm, etc.). Although these are optimization algorithms, they are also a type of learning algorithm referred to as biologically inspired computation or computational intelligence.