Clustering

The goal of clustering is to partition a dataset into subgroups of similar or homogeneous data points.

Definition of similarity really depends on the domain and application.

Table of contents

K-Means Clustering
Hierarchical Clustering

K-Means Clustering

Where the number of clusters $K$ is pre-specified.

K-Means Clustering

There is no definite way to find the optimal pre-specified $K$.
- Hierarchical clustering does not require a pre-specified cluster number.
- You could do hierarchical clustering first to get a sense of how many clusters you want and then do K-means.

Hierarchical Clustering

Hierarchical Clustering