Probability VS Likelihood

Probability and Likelihood are often considered as the same thing, which is incorrect when we come to pure statistics. There’s a clear difference between the two. Read on!:fast_forward:

Consider a dataset of heights of kids. Assume it’s a normal distribution (for ease of understanding) with standard deviation of 2.5, the mean height is 145 cm, least value is 120 cm and the maximum value is 165 cm. :bar_chart:

Now, what will be the probability of height being more than 150 cm if you pick a kid at random? It will be the area under the distribution curve from the point 150 cm to the maximum point.:white_check_mark:

We calculated-
P(Height>150 cm | mean=150,SD=2.5) which is read as probability of height being more than 150 cm GIVEN that mean = 150 cm & SD = 2.5. We can keep on calculating probabilities of different heights, but the mean and SD of our distribution will remain FIXED.

Now, considering the same dataset, I ask you what mean and SD of a distribution curve will best represent/fit the data? Now you’ll have to calculate the “Likelihood” of data being of a certain distribution, GIVEN a data point. The equation will be:
L(mean=150 cm,SD=2.5 | height = 148 cm)
:bulb:
So the data point is fixed but we change the mean and SD to find the Maximum Likelihood, L.

#statistics #datascience