How K-Means Clustering Works
K-Means partitions data into K clusters by iteratively assigning points to the nearest centroid. It's fast, scalable, and ideal for spherical clusters in medium-to-large datasets.
About the Customer Segments Dataset
200 synthetic customer records with spending, income, and loyalty data. Perfect for market segmentation with clustering.
- Samples
- 200
- Features
- 5
- Type
- Numeric
- Category
- Partition-based
Key Metrics to Watch
Silhouette Score
Measures how similar a point is to its own cluster vs. other clusters. Ranges from −1 to +1; higher is better.
Calinski-Harabasz Index
Ratio of between-cluster to within-cluster variance. Higher values indicate denser, well-separated clusters.
Davies-Bouldin Index
Average similarity between each cluster and its most similar cluster. Lower is better.
Inertia (Within-Cluster SSE)
Sum of squared distances from each point to its assigned centroid. Lower indicates tighter clusters.
When to Use K-Means Clustering
K-Means Clustering belongs to the Partition-based family of clustering algorithms. These methods divide data into non-overlapping subsets. They work best when clusters are roughly spherical and similar in size.
Related Examples
K-Means Clustering on Iris Flowers
See K-Means Clustering applied to the Iris Flowers dataset (150 samples, 4 features). Interactive visualization, metrics, and analysis.
K-Means Clustering on Wine Quality
See K-Means Clustering applied to the Wine Quality dataset (178 samples, 13 features). Interactive visualization, metrics, and analysis.
K-Medoids Clustering on Customer Segments
See K-Medoids Clustering applied to the Customer Segments dataset (200 samples, 5 features). Interactive visualization, metrics, and analysis.
DBSCAN Clustering on Customer Segments
See DBSCAN Clustering applied to the Customer Segments dataset (200 samples, 5 features). Interactive visualization, metrics, and analysis.