Now we will take an example and try to understand how Adaboost works.

I consider a simple example here. You can see this data in 3 columns age, BMI, and gender.

AGE BMI GENDER
25 24 F
41 31 F
56 28 M
78 26 F
62 30 M

Let’s consider gender as the target column and the rest be the independent variable. Let say we try to fit a boosting algorithm or Adaboost on this data.

The very first thing it does is, it will assign a weight to all these records called initial weights. The initial weights would be a sum equal to 1.

AGE BMI GENDER INITIAL WEIGHTS
25 24 F 1/5
41 31 F 1/5
56 28 M 1/5
78 26 F 1/5
62 30 M 1/5

Now, as I told you Adaboost is a sequential learning process, what will happen is the first model or
first-week learner or first base model will be fit on this data. So, as I told you here in Adaboost the weak learners are stumps that have one root and
two leaves.

Source

This is one weak learner, the very first weak learner.

Now, you may ask how do we create this one stump?

The fundamental concept remains the same. Gini or index entropy whatever we take, the first two columns are the candidate columns for crating the root node. So, Gini index or entropic will be checked then a condition will be selected and then this stump will be created.

Once this stump is created, what will happen is this data will be tested for accuracy on this stump. There is a possibility that when the testing happens on this training data, some of this classification which this stump will do might go wrong.

So, let’s say this stump is created and then the testing happens on this data which produces the following results.

AGE BMI GENDER INITIAL WEIGHTS PREDICTION
25 24 F 1/5 Correct
41 31 F 1/5 Correct
56 28 M 1/5 Wrong
78 26 F 1/5 Correct
62 30 M 1/5 Correct

So, what happens in the next iteration is these initial weights are changed. This is very important for boosting techniques.

Initially, we started with giving similar weight to all the records which means all the records were equally important for the model. But what happens in the next iteration or next model is something that has been misclassified.

In this case, the third particular record has been misclassified by the previous model. So, what will happen is the weight for this record goes up and to normalize the entire weight, the weight for all other records comes down.

Now, in the next model the more importance is given to previously misclassified records or what happens in the next iteration or weak learner is this particular record will try to classify correctly with more weightage. So, the next learner will focus more on this particular record.

## ** Final model**

Here we are just talking about 5 records only. What if we have about 1 million records. There will be a good number of records that were misclassified by this weak learner and hence those records will be given higher weightage for the next learner or next ML model 2.

Similarly, this ML model 2 will misclassify some of the observations. That observation will again be given more weight and other observations’ weight will be coming down to normalize. Similarly, all the models will be created and whatever the misclassification happens in the previous model, the next model will try to classify it correctly. This is how in sequence one model takes the input from the previous model and tries to classify. This is adaptive boosting and that is why the name is adaptive boosting. Because it adapts to the previous model.

Source

In the end, the final model is a model which is a combination of all these learnings and hence this technique is called boosting technique and this algorithm is called Adaboost.

The important thing to understand here is the initialization of weight and adjustment of weight based on misclassification, the internal fundamental concepts of creating a decision tree, creating stumps remain the same like gini entropy and all those things. But what is different here is these weights and it’s adjustments. This is how Adaboost works.