In the group average method, we take all the pairs of data points and calculate their similarities and find the average of all the similarities.
Mathematically this can be written as,
sim(C1, C2) = ∑ sim(Pi, Pj)/|C1|*|C2|
where Pi ∈ C1 & Pj ∈ C2
Image Source: Google Images
Pros of Group Average:
- This approach does well in separating clusters if there is noise present between the clusters.
Cons of Group Average:
- This approach is biased towards globular clusters.