What is the Math Behind KNN Algorithm?

board-infinity · 7 October 2022 06:49

Euclidean Distance Function

Euclidean distance can simply be defined as the shortest between the 2 points irrespective of the dimensions. The most common way to find the distance between is the Euclidean distance. According to the Euclidean distance formula, the distance between two points in the plane with coordinates (x, y) and (a, b) is given by

dist((x, y), (a, b)) = √(x — a)² + (y — b)²

For a given value of K, the algorithm will find the k-nearest neighbors of the data point and then it will assign the class to the data point by having the class which has the highest number of data points out of all classes of the K neighbors.

After computing the distance, the input x gets assigned to the class with the largest probability:

Below is a function named euclidean_distance() that implements this in Python.

#calculating the Euclidean distance between two vectors
def euclidean_distance(row1, row2):
 distance = 0.0
 for i in range(len(row1)-1):
  distance += (row1[i] — row2[i])**2
 return sqrt(distance)

Minkowski Distance Function

Minkowski distance is a distance measurement between two points in the normed vector space (N-dimensional real space) and is a generalization of the Euclidean distance.

Consider two points P1 and P2:

P1: (X1, X2, ..., XN)

P2: (Y1, Y2, ..., YN)

Below is a function named Minkowski_distance() that implements this in Python.

# calculating Minkowski distance between vectors

from math import sqrt
# calculate Minkowski distance
def Minkowski_distance(a, b, p):
return sum(abs(e1-e2)**p for e1, e2 in zip(a,b))**(1/p)
# define data
 row1 = [10, 20, 15, 10, 5]
 row2 = [12, 24, 18, 8, 7]
# calculate distance (p=1)
 dist = Minkowski_distance(row1, row2, 1)
 print(dist)
# calculate distance (p=2)
 dist = Minkowski_distance(row1, row2, 2)
 print(dist)