Chapter 4 Cluster Analysis
Section 2 K-Means
Page 3 K-Means Algorithm

Objectives

The objectives of this section are:
to explain the K-Means clustering algorithm
to introduce its limitations
to examine the various major parts

Outcomes

By the time you have completed this section you will be able to:
briefly explain the K-Means clustering algorithm
list the major steps in the algorithm

The K-Means basic algorithm creates a couple of additional issues that must be considered and in some situations resolved in order to provide a realistic output.

Handling Empty Clusters:

This occurs when no points are assigned to a centriod during the assignment step, the re-calculation step does not get rid of this cluster, and it also does not re-calculate the centriod value because no points are being used and so essentially we will have an output with k-1 cluster. The only solution is to choose a replacement centriod. This replacement centriod can be picked by choosing the point that is farthest away from any current centriod or by choosing a replacement from the cluster that has the highest mean value.

Outliers:

Outliers are points that are far away from the centriod and they can unduly influence centriod values and in turn this can skew cluster grouping and increase the amount of time needed to find an optimal solution. Outliers need to be discovered and eliminated beforehand so as to avoid these complications. There a number of approaches used to avoid and remove outliers but that is beyond the scope of this course.