Pipeline Stages

data source

Data Loader

Loads the Customer Segments dataset (200 samples, 5 features)

preprocessing

Imputer

Fills missing values using median strategy

preprocessing

Standard Scaler

Normalizes features to zero mean, unit variance

model

DBSCAN

Density-based clustering with eps=0.5, min_samples=5

evaluation

Cluster Plot

Scatter plot showing cluster shapes and noise points

evaluation

Metrics

Silhouette score, cluster count, noise ratio

Overview

This pipeline demonstrates a production-grade clustering workflow: starting with median imputation to handle missing values, followed by feature standardization, then DBSCAN for density-based clustering that automatically discovers the number of clusters and identifies outliers as noise points.

Why This Pipeline Works

DBSCAN excels where K-Means struggles — it discovers clusters of arbitrary shape, automatically determines the number of clusters, and naturally identifies noise points. The preprocessing chain (imputation → scaling) ensures robust operation on real-world datasets with missing values and varying feature scales.

Expected Output

The pipeline produces cluster labels (including -1 for noise/outlier points), a visualization showing cluster shapes and outliers, and metrics including Silhouette Score and the number of detected clusters. The noise ratio indicates what percentage of points couldn't be assigned to any cluster.

Evaluation Metrics

Silhouette Score

cluster cohesion and separation (excluding noise points)

Number of clusters

automatically determined by the algorithm

Noise ratio

percentage of points classified as outliers

Calinski-Harabasz Index

cluster density and separation measure

Ready to try it?

Load this blueprint into the interactive pipeline editor and run it on sample data — no setup required.

Try this Blueprint

Pipeline Stages

data source

Data Loader

Loads the Customer Segments dataset (200 samples, 5 features)

preprocessing

Imputer

Fills missing values using median strategy

preprocessing

Standard Scaler

Normalizes features to zero mean, unit variance

model

DBSCAN

Density-based clustering with eps=0.5, min_samples=5

evaluation

Cluster Plot

Scatter plot showing cluster shapes and noise points

evaluation

Metrics

Silhouette score, cluster count, noise ratio

Why This Pipeline Works

Density-Based Clustering Pipeline

Pipeline Stages

Overview

Why This Pipeline Works

Expected Output

Evaluation Metrics

Related Topics

Ready to try it?

Density-Based Clustering Pipeline

Pipeline Stages

Overview

Why This Pipeline Works

Expected Output

Evaluation Metrics

Related Topics

Ready to try it?