Unsupervised Learning: Clustering
Group unlabeled data by structure and similarity to discover segments.
Clustering finds natural groups in data without labels, revealing segments, communities, and structure. The methods below range from fast centroid-based partitioning to density-based and hierarchical approaches that handle arbitrary shapes.
- Start with k-Means for fast general clustering.
- Use HDBSCAN when clusters have arbitrary shapes or you need noise detection.
Note
scikit-learn's clustering documentation includes common clustering algorithms such as k-Means and related methods, with k-Means framed around minimizing within-cluster variance.
| # | Algorithm | Best for | Common fields |
|---|---|---|---|
| 1 | k-Means | Fast general clustering |
|
| 2 | Hierarchical Clustering | Cluster trees/dendrograms |
|
| 3 | DBSCAN / HDBSCAN | Arbitrary-shaped clusters, noise detection |
|
| 4 | Gaussian Mixture Models | Soft probabilistic clusters |
|
| 5 | Spectral Clustering | Nonlinear cluster structures |
|
| 6 | Mean Shift | Mode/peak discovery |
|