Curse of Dimensionality describes the explosive nature of increasing data dimensions and its resulting exponential increase in computational efforts required for its processing and/or analysis. This term was first introduced by Richard E. Bellman, to explain the increase in volume of Euclidean space associated with adding extra dimensions, in area of dynamic programming. Today, this phenomenon is observed in fields like machine learning, data analysis, data mining to name a few.
A common problem in machine learning is sparse data, which alters the performance of machine learning algorithms and their ability to calculate accurate predictions. Data is considered sparse when certain expected values in a dataset are missing, which is a common phenomenon in general large scaled data analysis.
Another facet of the curse of dimensionality is ‘Distance Concentration’. Distance concentration refers to the problem of all the pairwise distances between different samples/points in the space converging to the same value as the dimensionality of the data increases. Several machine learning models such as clustering or nearest neighbor’s methods use distance-based metrics to identify similarities or proximity of the samples.
FEATURE SELECTION TECHNIQUE
In feature selection techniques, the attributes are tested for their worthiness and then selected or eliminated. Some of the commonly used Feature selection techniques are discussed below.
- Low Variance filter
- High Correlation filter
- Feature Ranking
- Forward selection
FEATURE EXTRACTION TECHNIQUE
n feature extraction techniques, the high dimensional attributes are combined in low dimensional components (PCA or ICA) or factored into low dimensional factors (FA).
- Principal Component Analysis (PCA)
- Factor Analysis (FA)
- Independent Component Analysis (ICA)