03 Chapter

Unsupervised Learning: Clustering

Group unlabeled data by structure and similarity to discover segments.

Clustering finds natural groups in data without labels, revealing segments, communities, and structure. The methods below range from fast centroid-based partitioning to density-based and hierarchical approaches that handle arbitrary shapes.

  • Start with k-Means for fast general clustering.
  • Use HDBSCAN when clusters have arbitrary shapes or you need noise detection.
Note

scikit-learn's clustering documentation includes common clustering algorithms such as k-Means and related methods, with k-Means framed around minimizing within-cluster variance.

#AlgorithmBest forCommon fields
1k-Means Fast general clustering
  • Customer segmentation
  • vector clustering
  • compression
2Hierarchical Clustering Cluster trees/dendrograms
  • Biology
  • market research
  • document grouping
3DBSCAN / HDBSCAN Arbitrary-shaped clusters, noise detection
  • Geospatial data
  • anomaly detection
  • fraud
  • network analysis
4Gaussian Mixture Models Soft probabilistic clusters
  • Finance
  • signal processing
  • customer behavior
5Spectral Clustering Nonlinear cluster structures
  • Image segmentation
  • graph data
6Mean Shift Mode/peak discovery
  • Computer vision
  • spatial clustering