What is clustering tendency?

rajanikant-ghate · 3 August 2022 18:07

Why does this question exist?

Let’s consider the following array:

import numpy as np
x = np.arange(1000)

Can you cluster ‘x’? Can you try elbow method and find what should be optimum number of clusters?

Basically, ‘x’ is uniformly distributed and hence doesn’t have ‘clustering tendency’.
Any real distribution, might not be exactly as smooth as ‘x’, and hence from elbow method alone, it will be hard to determine clustering tendency.

Ok, then we can just plot 'x' and check the distribution. But here's the catch, how will you plot data of 12 dimensions?

Hence there’s a simple measure called Hopkins statistic. Underlying this hypothesis, there is a test with null hypothesis that the dataset is uniformly distributed. If the Hopkins statistic is greater than 0.75, there is a clustering tendency with 90% confidence interval.