Key Issues

Chapter 4 Cluster Analysis

Section 3 Agglomerative Hierarchical Clustering

Page 3 Key Issues

Objectives

The objectives of this section are:
to define agglomerative hierarchical clustering
to explain its basic algorithm
to briefly mention key issues it presents

Outcomes

By the time you have completed this section you will be able to:
define agglomerative hierarchical clustering
describe the algorithm
list key issues that this method creates/resolves

< Previous

Page 2 of 3

Next >

Initialization Step:

Before we can calculate the proximity matrix we must first define cluster proximity between clusters. This can be defined with a particular clustery type in mind as mentioned in section 2 of this chapter. If we were to use a graph-based view of clusters, the MIN, MAX and Group Average clustering techniques can be used. All depending on which cluster type is chosen and which clustering algorithm is used the resulting dendrogram might differ. Each data point is treated as a cluster and the distance is calculated based on the algorithm chosen.

Merge Step:

After the proximity matrix is constructed, it’s values are used to decide which two clusters need to be joined together first. For instance, if we were to use the graph-based view of clusters and the MIN techniques, the two closest (minimum distance) clusters would be joined together.

Re-calculation Step:

The proximity matrix is updated to reflect the new clusters that have been constructed in the previous step.

Key Issues in Hierarchical Clustering

Lack of a Global Objective Function: agglomerative hierarchical clustering techniques perform clustering on a local level and as such there is no global objective function like in the K-Means algorithm. This is actually an advantage of this technique because the time and space complexity of global functions tends to be very expensive.

Ability to Handle Different cluster Sizes: we have to decide how to treat clusters of various sizes that are merged together.

Merging Decisions Are Final: one downside of this technique is that once two clusters have been merged they cannot be split up at a later time for a more favorable union.

Contents

1 Contents

Chapters

1 Decision Trees

2 Association Analysis

3 Visualization

4 Cluster Analysis

Sections

Clustering Basics