# What is K-means? How can you select K for K-means?

What is K-means? How can you select K for K-means?

Elbow method : Calculate the Within-Cluster-Sum of Squared Errors (WSS) for different values of k, and choose the k for which WSS first starts to diminish. In the plot of WSS-versus-k, this is visible as an elbow.

Silhouette method : The silhouette value measures how similar a point is to its own cluster (cohesion) compared to other clusters (separation).The range of the Silhouette value is between +1 and -1. A high value is desirable and indicates that the point is placed in the correct cluster. If many points have a negative Silhouette value, it may indicate that we have created too many or too few clusters.

The k-means clustering algorithm splits an entire data into k sets. Each set contains a centroid, and the distance between the centroid and all the individual points in the set is minimized. To find those k centroids and their associated data points, this algorithm is followed:

1. Obtain the entire data set and randomly assign k centroids. (Those k centroids can be the first k data points.)

2. For each data point,

3. Calculate the absolute distance between the point and each of the k centroids.

4. The data point then becomes part of the the group of the centroid that corresponds to the data pointâ€™s minimum distance.

5. To update each centroid, take the mean of all the points inside its group.

6. Repeat steps 2 and 3 if the centroids have changed in step 3.

7. You are finished with the algorithm.