What is the group average method for calculating the similarity between two clusters for the Hierarchical Clustering Algorithm?

In the group average method, we take all the pairs of data points and calculate their similarities and find the average of all the similarities.

Mathematically this can be written as,

sim(C1, C2) = ∑ sim(Pi, Pj)/|C1|*|C2|

where Pi ∈ C1 & Pj ∈ C2

calculating similarity Hierarchical clustering

Image Source: Google Images

Pros of Group Average:

  • This approach does well in separating clusters if there is noise present between the clusters.

Cons of Group Average:

  • This approach is biased towards globular clusters.