Python Implementation of SVM

vishrut-singhal · 13 May 2021 14:21

we will implement the SVM algorithm using Python. Here we will use the same dataset user_data , which we have used in Logistic regression and KNN classification.

Data Pre-processing step

Till the Data pre-processing step, the code will remain the same. Below is the code:

#Data Pre-processing Step
importing libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd

#importing datasets

data_set= pd.read_csv(‘user_data.csv’)

#Extracting Independent and dependent Variable

x= data_set.iloc[:, [2,3]].values
y= data_set.iloc[:, 4].values

Splitting the dataset into training and test set.

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)
#feature Scaling
from sklearn.preprocessing import StandardScaler
st_x= StandardScaler()
x_train= st_x.fit_transform(x_train)
x_test= st_x.transform(x_test)

After executing the above code, we will pre-process the data. The code will give the dataset as:

The scaled output for the test set will be:

Fitting the SVM classifier to the training set:

Now the training set will be fitted to the SVM classifier. To create the SVM classifier, we will import SVC class from Sklearn.svm library. Below is the code for it:

from sklearn.svm import SVC # “Support vector classifier”
classifier = SVC(kernel=‘linear’, random_state=0)
classifier.fit(x_train, y_train)

In the above code, we have used kernel=‘linear’ , as here we are creating SVM for linearly separable data. However, we can change it for non-linear data. And then we fitted the classifier to the training dataset(x_train, y_train)

Output:

Out[8]: SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape=‘ovr’, degree=3, gamma=‘auto_deprecated’, kernel=‘linear’, max_iter=-1, probability=False, random_state=0, shrinking=True, tol=0.001, verbose=False)

The model performance can be altered by changing the value of C(Regularization factor), gamma, and kernel .

Predicting the test set result:
Now, we will predict the output for test set. For this, we will create a new vector y_pred. Below is the code for it:

#Predicting the test set result
y_pred= classifier.predict(x_test)

After getting the y_pred vector, we can compare the result of y_pred and y_test to check the difference between the actual value and predicted value.

Output: Below is the output for the prediction of the test set:

Creating the confusion matrix:
Now we will see the performance of the SVM classifier that how many incorrect predictions are there as compared to the Logistic regression classifier. To create the confusion matrix, we need to import the confusion_matrix function of the sklearn library. After importing the function, we will call it using a new variable cm . The function takes two parameters, mainly y_true ( the actual values) and y_pred (the targeted value return by the classifier). Below is the code for it:

#Creating the Confusion matrix
from sklearn.metrics import confusion_matrix
cm= confusion_matrix(y_test, y_pred)

Output:

Support Vector Machine Algorithm

As we can see in the above output image, there are 66+24= 90 correct predictions and 8+2= 10 correct predictions. Therefore we can say that our SVM model improved as compared to the Logistic regression model.

Visualizing the training set result:
Now we will visualize the training set result, below is the code for it:

from matplotlib.colors import ListedColormap
x_set, y_set = x_train, y_train
x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01),
nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape),
alpha = 0.75, cmap = ListedColormap((‘red’, ‘green’)))
mtp.xlim(x1.min(), x1.max())
mtp.ylim(x2.min(), x2.max())
for i, j in enumerate(nm.unique(y_set)):
mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
c = ListedColormap((‘red’, ‘green’))(i), label = j)
mtp.title(‘SVM classifier (Training set)’)
mtp.xlabel(‘Age’)
mtp.ylabel(‘Estimated Salary’)
mtp.legend()
mtp.show()

Output:

By executing the above code, we will get the output as:

Support Vector Machine Algorithm

As we can see, the above output is appearing similar to the Logistic regression output. In the output, we got the straight line as hyperplane because we have used a linear kernel in the classifier . And we have also discussed above that for the 2d space, the hyperplane in SVM is a straight line.

Visualizing the test set result:

#Visulaizing the test set result
from matplotlib.colors import ListedColormap
x_set, y_set = x_test, y_test
x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01),
nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape),
alpha = 0.75, cmap = ListedColormap((‘red’,‘green’ )))
mtp.xlim(x1.min(), x1.max())
mtp.ylim(x2.min(), x2.max())
for i, j in enumerate(nm.unique(y_set)):
mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
c = ListedColormap((‘red’, ‘green’))(i), label = j)
mtp.title(‘SVM classifier (Test set)’)
mtp.xlabel(‘Age’)
mtp.ylabel(‘Estimated Salary’)
mtp.legend()
mtp.show()

Output:

By executing the above code, we will get the output as:

Support Vector Machine Algorithm

As we can see in the above output image, the SVM classifier has divided the users into two regions (Purchased or Not purchased). Users who purchased the SUV are in the red region with the red scatter points. And users who did not purchase the SUV are in the green region with green scatter points. The hyperplane has divided the two classes into Purchased and not purchased variable.

Python Implementation of SVM

importing libraries

Splitting the dataset into training and test set.