Weak vs. Strong Learners

In computational learning theory, specifically PAC learning, the formal classes of weak and strong learnability were defined with the open question as to whether the two were equivalent or not.Later, it was discovered that they are indeed equivalent. More so that a strong learner can be constructed from many weak learners, formally defined. This provided the basis for the boosting class of ensemble learning methods.Although this theoretical finding was made, it still took years before the first viable boosting methods were developed, implementing the procedure.

Most notably Adaptive Boosting, referred to as AdaBoost, was the first successful boosting method, later leading to a large number of methods, culminating today in highly successful techniques such as gradient boosting and implementations such as Extreme Gradient Boosting (XGBoost).Generally, the goal of boosting ensembles is to develop a large number of weak learners for a predictive learning problem, then best combine them in order to achieve a strong learner. This is good goal as weak learners are easy to prepare but not desirable, and strong learners are hard to prepare and highly desirable.

  • Weak Learner : Easy to prepare, but not desirable due to their low skill.
  • Strong Learner : Hard to prepare, but desirable because of their high skill.

The procedure that was found to achieve this is to sequentially develop weak learners and add them to the ensemble, where each weak learner is trained in a way to pay more attention to parts of the problem domain that prior models got wrong. Although all boosting techniques follow this general procedure with specific differences and optimizations, the notion of weak and strong learners is a useful concept more generally for machine learning and ensemble learning.