Central Limit Theorem in Statistics

The Normal Distribution has a crucial role in stats - The Central Limit Theorem.

The central limit theorem in statistics states that, given a sufficiently large sample size, the sampling distribution will approximate a normal distribution regardless of that variable’s distribution. Let me bring the essence of the above statement in plain words.

The data might be of any distribution. It could be perfectly or skewed normal, it could be exponential or (almost) any distribution you may think of. However, if you repeatedly take samples from the population and keep plotting the histogram of their means, you will eventually find that this new distribution of all the means resembles the Normal Distribution!

In essence, it doesn’t matter what distribution your data is in, the distribution of their means will always be normal.

I just said that you “repeatedly take samples”. But how many samples? The thumb rule says that it should be >30. So if you take 30 or more samples from any distribution, the means will be normally distributed.

This property is extremely useful in statistics and Data Science for something that I’ll be posting in coming days.

(Spoiler alert: It’s Hypothesis Testing :p)

#datascience #statistics #machinelearning