03 Chapter

Unsupervised Learning: Clustering

Group unlabeled data by structure and similarity to discover segments.

Clustering finds natural groups in data without labels, revealing segments, communities, and structure. The methods below range from fast centroid-based partitioning to density-based and hierarchical approaches that handle arbitrary shapes.

Start with k-Means for fast general clustering.
Use HDBSCAN when clusters have arbitrary shapes or you need noise detection.

Note

scikit-learn's clustering documentation includes common clustering algorithms such as k-Means and related methods, with k-Means framed around minimizing within-cluster variance.

#	Algorithm	Best for	Common fields
1	k-Means	Fast general clustering	Customer segmentation vector clustering compression
2	Hierarchical Clustering	Cluster trees/dendrograms	Biology market research document grouping
3	DBSCAN / HDBSCAN	Arbitrary-shaped clusters, noise detection	Geospatial data anomaly detection fraud network analysis
4	Gaussian Mixture Models	Soft probabilistic clusters	Finance signal processing customer behavior
5	Spectral Clustering	Nonlinear cluster structures	Image segmentation graph data
6	Mean Shift	Mode/peak discovery	Computer vision spatial clustering