Clustering

The goal of clustering is to partition a dataset into subgroups of similar or homogeneous data points.

Definition of similarity really depends on the domain and application.

Table of contents
  1. K-Means Clustering
  2. Hierarchical Clustering

K-Means Clustering

Where the number of clusters $K$ is pre-specified.

K-Means Clustering

  • There is no definite way to find the optimal pre-specified $K$.
    • Hierarchical clustering does not require a pre-specified cluster number.
    • You could do hierarchical clustering first to get a sense of how many clusters you want and then do K-means.

Hierarchical Clustering

Hierarchical Clustering