The objectives of this section are:
define clustering
outline the various application of clustering
delve into the various types of clustering
define the various types of clusters
introduce some popular clustering algorithms
By the time you have completed this section you will be able to:
list some of the applications of cluster analysis
define clustering
list the various types of clusters and clustering
list some of the well known clustering algorithms
Cluster analysis aims to group data objects based on the information that is available that describes the objects and their relationships. The main goal is to group similar objects together, and the greater the similarity within a group the better and the greater the difference between group the more diverse the clustering.
Clustering is a form of unsupervised learning because as previously mentioned we do not have a data set has been previously labeled to train the current data on.
For instance, Figure 1(on the left) shows a set of data points while Figure 2 (below) outlines possible clusters that may exist. If this was supervised learning we would have a dataset that has already been classified that can be used to control and guide the process.
Clustering and clusters are not synonymous. A clustering is an entire collection of clusters; a cluster on the other hand is just one part of the entire picture. There are different types of clusters and also different types of clustering.
Clusters can be created based on varying characteristics, some of which are mentioned briefly below.
Well-Separated: Clusters can be well-separated by this we mean that the distance between any two points in different clusters is large and the distance between any two objects in a cluster is relatively smaller.
Prototype-Based: A cluster is defined based on the prototype by this we mean that each object is classified by its proximity to a prototype.
Graph-Based: In situations where the data is represented by a graph and the nodes are objects with connection visualized by links a cluster is a group of objects that are connected to each other but have no connection to objects outside of the group.
Density-Based: This type of cluster is defined by the density of the region. A cluster is a dense region of objects surrendered by a region of low density.