Data Science is not possible without Statistics. In fact, Data Science is, up to some extent, an extension of Statistics.
Stats is a huge field in itself, but these 5 concepts are too crucial to be missed.
- Descriptive Statistics: The starting point for any dataset at hand might be just to look at its descriptive stats.
Mean, Median, Mode, Variance, Standard Deviation, value counts can give you a basic overview of the data.
- Data Distribution: Checking what the distribution of various features looks like is a fundamental task.
Distribution could be discrete or continuous. Discrete distributions include Binomial, and Poisson.
Continuous distributions for numerical data where values can take any value. Normal, Skewed and Exponential distribution are some of the commonly seen ones.
- Dimensionality Reduction: Your data might have 100s of features having the similar information, which add nothing but redundancy and computational overhead.
The information or variance from these 100s of features can be projected on to a set of “components” in a transformed space.
These uncorrelated components now have the variance explained in just few of the components, and rest can be dropped.
#datascience #machinelearning #statistics