The objectives of this section are:
to define agglomerative hierarchical clustering
to explain its basic algorithm
to briefly mention key issues it presents
By the time you have completed this section you will be able to:
define agglomerative hierarchical clustering
describe the algorithm
list key issues that this method creates/resolves
Agglomerative is a mouthful, and just by looking at this word one might be intimidated and assume that the concept is difficult to understand, this assumption would be false. Agglomerate means to collect in round mass, to gather something and form it into a rounded mass, one such example would be to collect various pieces of play dough together and form a circular object. This is the basic idea behind agglomerative hierarchical clustering. It starts with each point as an individual cluster and then at each step it merges the closest pairs of clusters. It does this under all points have been merged into one cluster.
It is usually displayed graphically by using a dendrogram, which is a tree-like diagram that displays both the cluster-subcluster relationships and the order in which the clusters were merged. Figure 3 shows a dendrogram for the points clustered in Figure 2
The basic algorithm is as follows: start with individual data points are clusters and then merge the two closest clusters until only one cluster remains. Figure 4 below presents a high-level formal representation of the algorithm.