What is Classification in Machine Learning?

board-infinity · 6 October 2022 05:31

A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”. A classification model attempts to draw some conclusion from observed values. Given one or more inputs a classification model will try to predict the value of one or more outcomes.

Dataset Description

Title: Iris Plants Database
Attribute Information:
      1. sepal length in cm
      2. sepal width in cm
      3. petal length in cm
      4. petal width in cm
      5. class: 
       -- Iris Setosa
       -- Iris Versicolour
       -- Iris Virginica
 Missing Attribute Values: None
Class Distribution: 33.3% for each of 3 classes

# Python code to illustrate 
# classification using data set
#Importing the required library
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
  
#Importing the dataset
dataset = pd.read_csv(
          'https://archive.ics.uci.edu/ml/machine-learning-'+
          'databases/iris/iris.data',sep= ',', header= None)
data = dataset.iloc[:, :]
  
#checking for null values
print("Sum of NULL values in each column. ")
print(data.isnull().sum())
  
#separating the predicting column from the whole dataset
X = data.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
  
#Encoding the predicting variable
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)
  
#Splitting the data into test and train dataset
X_train, X_test, y_train, y_test = train_test_split(
              X, y, test_size = 0.3, random_state = 0)
  
#Using the random forest classifier for the prediction
classifier=RandomForestClassifier()
classifier=classifier.fit(X_train,y_train)
predicted=classifier.predict(X_test)
  
#printing the results
print ('Confusion Matrix :')
print(confusion_matrix(y_test, predicted))
print ('Accuracy Score :',accuracy_score(y_test, predicted))
print ('Report : ')
print (classification_report(y_test, predicted))

Output:

Sum of NULL values in each column. 
        0    0
        1    0
        2    0
        3    0
        4    0

Confusion Matrix :
                 [[16  0  0]
                  [ 0 17  1]
                  [ 0  0 11]]

Accuracy Score : 97.7

Report : 
           precision    recall  f1-score   support
     0       1.00      1.00      1.00        16
     1       1.00      0.94      0.97        18
     2       0.92      1.00      0.96        11
avg/total    0.98      0.98      0.98        45