The objectives of this section are:
define density-based clustering
explain the major parts
introduce the DBSCAN algorithm
list the limitations and advantages of this method
By the time you have completed this section you will be able to:
explain the basic DBSCAN algorithm
label points into the appropriate group type
determine which scenarios this algorithm would yield good results.
Now that we have defined the main players we can now move to outlining the algorithm. The DBSCAN algorithm involves grouping any two core points that are close enough into the same cluster. Border points that are close enough to a core point is put into the same cluster as the core point and noise points are discarded. Figure 3 below shows a high-level description of this algorithm.
This is the basic algorithm, two things to note is that the two user defined parameters that play a crucial role are not randomly chosen but must be calculated. The radius (E) as previously shown cannot be too small or too big and the number of threshold points (MinPts) must also be calculated.
DBSCAN and other density-based algorithms are important because they are resistant to noise and can handle clusters of various shapes and sizes. They are a lot of clusters that DBSCAN can find that K-mean would not be able to find. Figure 4 is one such example. Limitations of this algorithm are that DBSCAN has trouble when the clusters have widely varying densities it also has trouble dealing with high-dimensional data.