Naive Bayes Algorithm in ML uses the famous Bayes Theorem to predict the most probable target or hypothesis, GIVEN the data that we have. This data is used as our prior knowledge about the problem. Bayes’ Theorem is stated as:

P(A|d) = (P(d|A) * P(A)) / P(d)

Here, we calculate the probability of class being A, given the data d. This is known as Posterior Probability.

P(d|A) is the probability of data d given that the class A was true.

P(A) is the probability of class A being true (regardless of the data). This is called the prior probability of class A.

We calculate the posterior probability for all the classes for all the data instances, and select the class with the highest probability. This is the maximum probable class and also called the Maximum a Posteriori (MAP).

We usually ignore the P(d) as it is the data which remains constant for all the classes.

But why Naive?

Naive Bayes assumes that the data or the features you have are independent of each other. This assumption makes it very easy to calculate the probabilities. However, in real life scenarios, it’s unlikely that all the features would be completely independent. Even with such a Naive assumption, the Naive Bayes performs surprisingly well!

#statistics #datascience #machinelearning