Pipeline Stages
Data Loader
Loads the Customer Segments dataset (200 samples, 5 features)
Imputer
Fills missing values using median strategy
Standard Scaler
Normalizes features to zero mean, unit variance
DBSCAN
Density-based clustering with eps=0.5, min_samples=5
Cluster Plot
Scatter plot showing cluster shapes and noise points
Metrics
Silhouette score, cluster count, noise ratio
Overview
This pipeline demonstrates a production-grade clustering workflow: starting with median imputation to handle missing values, followed by feature standardization, then DBSCAN for density-based clustering that automatically discovers the number of clusters and identifies outliers as noise points.
Why This Pipeline Works
DBSCAN excels where K-Means struggles — it discovers clusters of arbitrary shape, automatically determines the number of clusters, and naturally identifies noise points. The preprocessing chain (imputation → scaling) ensures robust operation on real-world datasets with missing values and varying feature scales.
Expected Output
The pipeline produces cluster labels (including -1 for noise/outlier points), a visualization showing cluster shapes and outliers, and metrics including Silhouette Score and the number of detected clusters. The noise ratio indicates what percentage of points couldn't be assigned to any cluster.
Evaluation Metrics
Silhouette Score
cluster cohesion and separation (excluding noise points)
Number of clusters
automatically determined by the algorithm
Noise ratio
percentage of points classified as outliers
Calinski-Harabasz Index
cluster density and separation measure
Related Topics
Ready to try it?
Load this blueprint into the interactive pipeline editor and run it on sample data — no setup required.
Try this Blueprint