Random Forest defines proximity between two data points in the following way:
- Initialize proximities to zeroes.
- For any given tree, apply all the cases to the tree.
- If case i and case j both end up in the same node, then proximity prox(ij) between i and j increases by one.
- Accumulate over all trees in Random Forest and normalize by twice the number of trees in Random forest.
Finally, it creates a proximity matrix i.e, a square matrix with entry as 1 on the diagonal and values between 0 and 1 in the off-diagonal positions. Proximities are close to 1 when the observations are “alike” and conversely the closer proximity to 0, implies the more dissimilar cases are.