Hierarchical clustering, also known as hierarchical cluster analysis, is an algorithm that groups similar objects into groups called clusters.
– You start with raw unlabelled data and the endpoint is a set of clusters.
– Each cluster is different from the other cluster, and the objects within each cluster are similar to each other.
How does it work?
– Hierarchical clustering starts by treating each observation as a separate cluster.
– Then it iteratively executes the following two steps: (1) identify the two clusters that are closest together, and (2) merge the two most similar clusters.
– This iterative process continues until all the clusters are merged together.
Hierarchical clustering is of two types:
(1) Agglomerative (AGNES)
You build clusters in a bottom up approach (AGNES). This is the most common type of hierarchical clustering agglomerative clustering, which is used to group objects in clusters based on their similarity. It’s also known as AGNES (Agglomerative Nesting). The algorithm starts by treating each object as a singleton cluster. Next, pairs of clusters are successively merged until all clusters have been merged into one big cluster containing all objects. The result is a tree-based representation of the objects, named dendrogram.
(2) Divisive (DIANA)
You build clusters in a top down approach (DIANA). It starts by including all objects in a single large cluster. At each step of iteration, the most heterogeneous cluster is divided into two. The process is iterated until all objects are in their own cluster.
Thanks for reading, shoot any questions you have!