To find the optimal number of clusters, Silhouette Score is considered to be one of the popular approaches. This technique measures how close each of the observations in a cluster is to the observation in its neighboring clusters.
Let ai be the mean distance between an observation i and other observations in the cluster to which observation i assigned.
Let bi be the minimum mean distance between an observation i and observation in other clusters.
Conclusion:
- The range of the Silhouette Scores is from -1 to +1. Higher the value of the Silhouette Score indicates observations are well clustered.
- Silhouette Score = 1 describes that the data point (i) is correctly and well-matched in the cluster assignment.