The objectives of this section are:
to define agglomerative hierarchical clustering
to explain its basic algorithm
to briefly mention key issues it presents
By the time you have completed this section you will be able to:
define agglomerative hierarchical clustering
describe the algorithm
list key issues that this method creates/resolves
Before we can calculate the proximity matrix we must first define cluster proximity between clusters. This can be defined with a particular clustery type in mind as mentioned in section 2 of this chapter. If we were to use a graph-based view of clusters, the MIN, MAX and Group Average clustering techniques can be used. All depending on which cluster type is chosen and which clustering algorithm is used the resulting dendrogram might differ. Each data point is treated as a cluster and the distance is calculated based on the algorithm chosen.
After the proximity matrix is constructed, it’s values are used to decide which two clusters need to be joined together first. For instance, if we were to use the graph-based view of clusters and the MIN techniques, the two closest (minimum distance) clusters would be joined together.
The proximity matrix is updated to reflect the new clusters that have been constructed in the previous step.
Lack of a Global Objective Function: agglomerative hierarchical clustering techniques perform clustering on a local level and as such there is no global objective function like in the K-Means algorithm. This is actually an advantage of this technique because the time and space complexity of global functions tends to be very expensive.
Ability to Handle Different cluster Sizes: we have to decide how to treat clusters of various sizes that are merged together.