Explain the concept of PCA

Principal component analysis

Principal component analysis (PCA) is an important technique to understand in the fields of statistics and data science. For more details, you can refer to the Link

Principal Component Analysis (PCA) tells us how to represent a dataset in lower dimensions. It does so by rejecting the traditional axes and instead picking the directions of maximum variance of the data to serve as the axes.

For instance, imagine we have a dataset D with 2 dimensional data that lies along the line y=x. This data is represented in two dimensions - every data point has an x point and a y point. PCA would identify the vector <1,1> as the direction of maximum variance, and use this vector as the new x axis. This allows us to represent the dataset D in only 1 dimension! Thus, PCA is a dimensionality reduction technique based on identifying the vectors of maximum variance.