What is K-means? How can you select K for K-means?
Elbow method : Calculate the Within-Cluster-Sum of Squared Errors (WSS) for different values of k, and choose the k for which WSS first starts to diminish. In the plot of WSS-versus-k, this is visible as an elbow.
Silhouette method : The silhouette value measures how similar a point is to its own cluster (cohesion) compared to other clusters (separation).The range of the Silhouette value is between +1 and -1. A high value is desirable and indicates that the point is placed in the correct cluster. If many points have a negative Silhouette value, it may indicate that we have created too many or too few clusters.
ref:https://towardsdatascience.com/10-tips-for-choosing-the-optimal-number-of-clusters-277e93d72d92
The k-means clustering algorithm splits an entire data into k sets. Each set contains a centroid, and the distance between the centroid and all the individual points in the set is minimized. To find those k centroids and their associated data points, this algorithm is followed:
-
Obtain the entire data set and randomly assign k centroids. (Those k centroids can be the first k data points.)
-
For each data point,
-
Calculate the absolute distance between the point and each of the k centroids.
-
The data point then becomes part of the the group of the centroid that corresponds to the data point’s minimum distance.
-
To update each centroid, take the mean of all the points inside its group.
-
Repeat steps 2 and 3 if the centroids have changed in step 3.
-
You are finished with the algorithm.