Chapter 4 Cluster Analysis
Section 1 Clustering Basics
Page 3 What is Cluster Analysis

Objectives

The objectives of this section are:
define clustering
outline the various application of clustering
delve into the various types of clustering
define the various types of clusters
introduce some of the major clustering algorithms

Outcomes

By the time you have completed this section you will be able to:
list some of the applications of cluster analysis
define clustering
list the various types of clusters and clustering
list the various types of clustering algorithms

What is Cluster Analysis?

Cluster analysis aims to group data objects based on the information that is available that describes the objects and their relationships.
The main goal is to group similar objects together, and the greater the similarity within a group the better and the greater the difference between group the more diverse the clustering. Data Points
For instance, Figure 1 which is on the left shows a set of data points while Figure 2 (shown below on the right) outlines 2 possible clusters that may exist.DataPointsClustered


Clustering and clusters are not synonymous. A clustering is an entire collection of clusters; a cluster on the other hand is just one part of the entire picture. There are different types of clusters and also different types of clustering.

Types of Clusters

So how does one define a cluster? What characteristics must each grouping have in order to be considered a cluster? Clusters can be created based on varying characteristics; these characteristics define the cluster and are used to determine what exactly constitutes a cluster.
Clusters can be

Types of Clustering

A Clustering as previously stated is an entire collection of clusters. We can classify clustering based on cluster nesting, exclusivenesses of data objects and inclusiveness of data objects.
Nesting: this separation is based on the characteristic of nesting clusters. Hierarchical clustering is a collection of nested clusters by this we mean that it also clusters to exist within bigger clusters in while partitional clustering prohibits subsets of cluster.
Exclusiveness of data objects: This separation is based on the characteristic that allows a data object to exist 1 or more than 1 clusters. Exclusive clustering is as the name suggests and stipulates that each data object can only exist in one cluster while Overlapping allows data objects to be grouped in 2 or more clusters. A real world example would be the breakdown of personnel at a school. Overlapping clustering would allow a student to also be grouped as an employee while exclusive clustering would demand that the person must choose the one that is more important.
Inclusiveness of data objects: This separation is based on the characteristic that requires all data objects to be grouped. A complete clustering assigns every object to a cluster while a partial clustering  allows some data objects to left alone.

Types of Clustering Algorithms

  1. Partitioning-based clustering
    • K-means clustering
    • K-medoids clustering
    • EM (expectation maximization) clustering
  2. Hierarchical clustering
    • Divisive clustering
    • Agglomerative clustering
  3. Density-Based Methods
    • Regions of dense points separated by sparser regions of relatively low density

 

The rest of this chapter focuses on the three major techniques that fall into the three categories states above in cluster analysis.