What is the curse of dimensionality?
It is a phenomena which occurs in high dimensional space that hardly occur in lower dimensional space. Due to higher number of dimension model gets sparse. Higher dimensional space causes problem in clustering (becomes very difficult to separate one cluster data from another), search space also increases, complexity of model increases.
magine you have one dimensional data which can be a straight line (imagine the line where the floor and the wall meets) and say you plot 100 points. Now let’s make this a 2D. A wall can be a 2D object. Now plot the same 100 points. Moving on, let’s imagine a 3D which can be the room that has the wall in it. Again plot the 100 points.
You must have noticed that the points become more sparse as we moved from a line to a wall and to a room. In a high dimensional space the same number of points are now separated by an exponentially large distance.
This causes a problem when we try to predict from the sparse data points. The prediction will be unreliable. You can think of plotting a linear regression line for a bunch of points. You can make a good prediction model if the points are close to each other but if there are only say 2 points, it would be very hard to come up with a line that can predict. The same effect is experienced in high dimensions. The distance between points increases exponentially thus making predictions on sparse data becomes next to impossible.