The objectives of this section are:
define clustering
outline the various application of clustering
delve into the various types of clustering
define the various types of clusters
introduce some of the major clustering algorithms
Cluster analysis aims to group data objects based on the information that is available that describes the objects and their relationships.
The main goal is to group similar objects together, and the greater the similarity within a group the better and the greater the difference between group the more diverse the clustering.
For instance, Figure 1 which is on the left shows a set of data points while Figure 2 (shown below on the right) outlines 2 possible clusters that may exist.
Clustering and clusters are not synonymous. A clustering is an entire collection of clusters; a cluster on the other hand is just one part of the entire picture. There are different types of clusters and also different types of clustering.
So how does one define a cluster? What characteristics must each grouping have in order to be considered a cluster? Clusters can be created based on varying characteristics; these characteristics define the cluster and are used to determine what exactly constitutes a cluster.
Clusters can be
A Clustering as previously stated is an entire collection of clusters. We can classify clustering based on cluster nesting, exclusivenesses of data objects and inclusiveness of data objects.
Nesting: this separation is based on the characteristic of nesting clusters. Hierarchical clustering is a collection of nested clusters by this we mean that it also clusters to exist within bigger clusters in while partitional clustering prohibits subsets of cluster.
Exclusiveness of data objects: This separation is based on the characteristic that allows a data object to exist 1 or more than 1 clusters. Exclusive clustering is as the name suggests and stipulates that each data object can only exist in one cluster while Overlapping allows data objects to be grouped in 2 or more clusters. A real world example would be the breakdown of personnel at a school. Overlapping clustering would allow a student to also be grouped as an employee while exclusive clustering would demand that the person must choose the one that is more important.
Inclusiveness of data objects: This separation is based on the characteristic that requires all data objects to be grouped. A complete clustering assigns every object to a cluster while a partial clustering allows some data objects to left alone.
The rest of this chapter focuses on the three major techniques that fall into the three categories states above in cluster analysis.