Practical implementation of Adaboost with Python

swapneel-panda-419bc751 · 3 June 2021 12:30

First of all, we will load all the basic libraries.

import pandas as pd import numpy as np from sklearn.ensemble import AdaBoostClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.metrics import confusion_matrix, accuracy_score from sklearn.preprocessing import LabelEncoder

Here, I use the breast cancer dataset which can be obtained from sklearn.datasets. It is
also available in Kaggle.

breast_cancer = load_breast_cancer()

Let’sdeclare our independent variable and the target variable.

X = pd.DataFrame(breast_cancer.data, columns = breast_cancer.feature_names) y = pd.Categorical.from_codes(breast_cancer.target, breast_cancer.target_names)

The target is to classify whether it is a benign or malignant cancer. So, let’s encode the target variable as 0 and 1. 0 for malignant, and 1 for benign.

encoder = LabelEncoder() binary_encoded_y = pd.Series(encoder.fit_transform(y))

Now we split our dataset as train set and test set.

train_X, test_X, train_y, test_y = train_test_split(X, binary_encoded_y, random_state = 1)

we use the Adaboost classifier. Here, we use a decision tree for our model.

classifier = AdaBoostClassifier(DecisionTreeClassifier(max_depth=1), n_estimators=200) classifier.fit(train_X, train_y)

Our Adaboost is fitted now. We will predict the target variable in the test set now.

prediction = classifier.predict(test_X)

Let’s obtain the confusion matrix.

confusion_matrix(test_y, prediction)

array

The main diagonal elements are well-classified data and secondary diagonal elements are misclassified data.

Let’s see the accuracy of classification now.

accuracy = accuracy_score(test_y, prediction) print('AdaBoost Accuracy: ', accuracy)

Adaboost accuracy

Our accuracy is 96.50%

It is quite a good accuracy.