How BIRCH Clustering Works
BIRCH efficiently clusters very large datasets by building a CF-tree summary structure, then applying a secondary clustering step.
About the Iris Flowers Dataset
Classic 150-sample dataset with 4 petal/sepal measurements across 3 species. The gold standard for clustering & classification demos.
- Samples
- 150
- Features
- 4
- Type
- Numeric
- Category
- Hierarchical
Key Metrics to Watch
Silhouette Score
Measures how similar a point is to its own cluster vs. other clusters. Ranges from −1 to +1; higher is better.
Calinski-Harabasz Index
Ratio of between-cluster to within-cluster variance. Higher values indicate denser, well-separated clusters.
Davies-Bouldin Index
Average similarity between each cluster and its most similar cluster. Lower is better.
Inertia (Within-Cluster SSE)
Sum of squared distances from each point to its assigned centroid. Lower indicates tighter clusters.
When to Use BIRCH Clustering
BIRCH Clustering belongs to the Hierarchical family of clustering algorithms. These methods build a tree of clusters, either by merging (agglomerative) or splitting (divisive). They reveal multi-scale structure in data.
Related Examples
K-Means Clustering on Iris Flowers
See K-Means Clustering applied to the Iris Flowers dataset (150 samples, 4 features). Interactive visualization, metrics, and analysis.
K-Medoids Clustering on Iris Flowers
See K-Medoids Clustering applied to the Iris Flowers dataset (150 samples, 4 features). Interactive visualization, metrics, and analysis.
DBSCAN Clustering on Iris Flowers
See DBSCAN Clustering applied to the Iris Flowers dataset (150 samples, 4 features). Interactive visualization, metrics, and analysis.
HDBSCAN Clustering on Iris Flowers
See HDBSCAN Clustering applied to the Iris Flowers dataset (150 samples, 4 features). Interactive visualization, metrics, and analysis.