A data distribution can be divided by three points– Lower Quartile, Middle Quartile(Median), and Upper Quartile. This essentially means that if you plot the data points on X axis, there will be 5 points in following order:
- Minima🔻
- Lower Quartile :Q1
- Middle Quartile (Median) :Q2
- Upper Quartile :Q3
- Maxima🔺
What this means is, the data points below the 25 Percentile mark will be between the Minima and the Q1. Next 25% of the data points lie between Q1 and Q2. Similarly, next 25% between Q2 & Q3 and last 25% between Q3 and Maxima.
Quartiles can be easily plotted using Box Plots, also called Box and Whiskers plots. It displays the Quartiles present in the data distribution and is the quickest way to check if your data is skewed.
A Box plot displays the distribution of data based on a five number summary as described above. Box plots are available as functions on all major libraries such as Pandas, Matplotlib and Seaborn.
Box Plots also describe the outliers which are the points that are beyond 1.5*Q value, where Q is the interquartile (Q3-Q1) range. So, points less than Q1-1.5Q (low extreme) and Q3+1.5Q (high extreme) will be considered as outliers.
#datascience #statistics #machinelearning