What is K-means? How can you select K for K-means?

What is K-means? How can you select K for K-means?

Elbow method : Calculate the Within-Cluster-Sum of Squared Errors (WSS) for different values of k, and choose the k for which WSS first starts to diminish. In the plot of WSS-versus-k, this is visible as an elbow.

Silhouette method : The silhouette value measures how similar a point is to its own cluster (cohesion) compared to other clusters (separation).The range of the Silhouette value is between +1 and -1. A high value is desirable and indicates that the point is placed in the correct cluster. If many points have a negative Silhouette value, it may indicate that we have created too many or too few clusters.

ref:https://towardsdatascience.com/10-tips-for-choosing-the-optimal-number-of-clusters-277e93d72d92

The k-means clustering algorithm splits an entire data into k sets. Each set contains a centroid, and the distance between the centroid and all the individual points in the set is minimized. To find those k centroids and their associated data points, this algorithm is followed:

  1. Obtain the entire data set and randomly assign k centroids. (Those k centroids can be the first k data points.)

  2. For each data point,

  3. Calculate the absolute distance between the point and each of the k centroids.

  4. The data point then becomes part of the the group of the centroid that corresponds to the data point’s minimum distance.

  5. To update each centroid, take the mean of all the points inside its group.

  6. Repeat steps 2 and 3 if the centroids have changed in step 3.

  7. You are finished with the algorithm.