Beyond K-Means

Beyond K-Means#

K-Means is a very useful starting point, but it is not the only clustering method. Two important alternatives are:

Hierarchical Clustering
DBSCAN

Hierarchical Clustering#

Hierarchical clustering builds a hierarchy of clusters instead of assigning all points at once.

Key Idea#

Instead of fixing the number of clusters beforehand, hierarchical clustering creates a tree-like structure of clusters. This structure is called a dendrogram

A dendrogram shows:

How clusters are merged or split
The distance at which this happens

By “cutting” the dendrogram at a certain height, we choose the number of clusters.

Figure: Dendrogram illustrating hierarchical clustering. The vertical axis shows the distance at which clusters are merged, and cutting the tree at different heights results in different numbers of clusters. _{Source: Medium}

There are two common approaches:

(1) Agglomerative: (Start small → merge upward) Start with each point as its own cluster, then repeatedly merge the closest clusters.
(2) Divisive: (Start big → split downward) Start with one large cluster, then repeatedly split it

Agglomerative merges clusters step-by-step, while divisive splits clusters step-by-step, forming a hierarchy of clusters.

(1) Agglomerative (Bottom-Up): How it works:#

Start with individual points → merge clusters

Compute all distances → pick the smallest → merge → repeat

Algorithm Steps:#

Initialize:
Treat each data point as its own cluster
Compute Distance Matrix:
Calculate pairwise distances between all clusters

Distance between clusters depends on the linkage method:
- Single Linkage: minimum distance between any two points
- Complete Linkage: maximum distance between any two points
- Average Linkage: average distance between all pairs
- Ward’s Method: minimizes variance within clusters
The linkage method controls how clusters are merged, and can significantly affect the final clustering structure.

Figure: Comparison of linkage methods in hierarchical clustering: single, complete, average, and Ward’s method. Single linkage may create long chains, while complete linkage produces tighter and more compact clusters. _{Source: Analytics Vidhya}

This step gives us a table of distances between every pair of clusters

Example: Distance Matrix

	A	B	C
A	—	2	5
B	2	—	3
C	5	3	—

Find Closest Clusters:
From the distance matrix, identify the two clusters with the smallest distance
- Example: smallest distance = 2 (A–B)
- So clusters A and B will be merged next
Merge Clusters:
Combine the selected clusters into a single cluster
Update Distances:
Recompute distances between the new cluster and all others
(using the same linkage method)
Repeat Steps 3–5
Continue until:
- Only one cluster remains OR
- Desired number of clusters is reached

(2) Divisive (Top-Down): How it works:#

Start with one cluster → split into smaller clusters

Algorithm Steps:#

Initialize:
Put all data points into one cluster
Select Cluster to Split:
Choose the cluster that is most heterogeneous
(e.g., highest variance or largest size)
Split Cluster:
Divide it into two smaller or sub clusters (often using a method like K-Means).cThe goal is to maximize separation between the two new clusters
Evaluate Split:
Check if further splitting improves cluster quality (e.g., better separation or reduced variance).
Repeat Steps 2–4
Continue until:
- Each point is its own cluster OR
- Desired number of clusters is reached

Divisive clustering is less commonly used because splitting is computationally more complex than merging.

Figure: Divisive clustering process (top-down). The algorithm starts with one large cluster and recursively splits it into smaller clusters until the desired structure is obtained. _{Source: GeeksforGeeks}

Why Use Hierarchical Clustering?#

No need to specify number of clusters initially
Provides insight into data structure
Useful for small to medium datasets

Limitation#

Computationally expensive for large datasets
Decisions (merge/split) cannot be undone

# Python Example
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import linkage, dendrogram

# Sample dataset
X, _ = make_moons(n_samples=250, noise=0.08, random_state=42)

# Apply Hierarchical Clustering
hc = AgglomerativeClustering(n_clusters=3, linkage='ward')
hc_labels = hc.fit_predict(X)

# -----------------------------
# Plot 1: Hierarchical Clusters
# -----------------------------
plt.figure(figsize=(6,5))
plt.scatter(X[:, 0], X[:, 1], c=hc_labels)
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.title("Hierarchical Clustering (Ward Linkage)")
plt.show()

# -----------------------------
# Plot 2: Dendrogram
# -----------------------------
Z = linkage(X, method='ward')

plt.figure(figsize=(8,5))
dendrogram(Z)
plt.title("Dendrogram for Hierarchical Clustering")
plt.xlabel("Data Points")
plt.ylabel("Distance")
plt.show()

../_images/e29e08660688c689e29f983ec2428b3c8f94578f72c323165085d28e2cb27558.png

../_images/895d764828da43dee4614a30e339d858bfb0c5b0dbbbca348e378a8a45406350.png

DBSCAN#

DBSCAN stands for: Density-Based Spatial Clustering of Applications with Noise. Unlike K-Means, DBSCAN does not try to create a fixed number of clusters. Instead, it groups together points that are packed closely together and labels isolated points as noise.

Main Idea#

Dense regions form clusters
Sparse regions may be treated as outliers

This makes DBSCAN especially useful when:

Clusters have irregular shapes
The dataset contains noise or outliers
Number of clusters is unknown

Key Parameters:#

DBSCAN uses two parameters: ε (neighborhood radius) and minPts (minimum points) to define dense regions and form clusters.

ε (epsilon): radius defining neighborhood
minPts: minimum number of points required to form a dense region

Types of Points#

Core Point: has at least minPts neighbors within ε
Border Point: close to a core point but not dense enough itself
Noise Point: not close to any cluster

How DBSCAN Works#

Pick an unvisited point
Find all points within distance ε (epsilon)
- This forms the point’s neighborhood
Check if the neighborhood has at least minPts points
- If YES → it is a core point, start a new cluster
- If NO → mark it as noise (may later become a border point)
Expand the cluster:
- Add all points within ε of the core point
- For each new point:
  - If it also has ≥ minPts neighbors → expand further
    This creates a chain of density-connected points
Repeat until all reachable points are added
Move to next unvisited point and repeat

DBSCAN finds clusters based on density, grouping nearby points together while identifying sparse points as noise.

Figure: DBSCAN clustering showing core points (dense regions), border points (edges of clusters), and noise points (outliers). _{Source: GeeksforGeeks}

In this figure:

Core points form the dense center of clusters
Border points lie on the edges of clusters
Noise points are isolated and do not belong to any cluster

DBSCAN builds clusters by expanding from core points and including all density-connected points (points reachable through neighboring core points).

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from IPython.display import HTML
from sklearn.datasets import make_moons
from sklearn.cluster import DBSCAN
from sklearn.neighbors import NearestNeighbors

# -----------------------
# Generate dataset
# -----------------------
X, _ = make_moons(n_samples=200, noise=0.08, random_state=42)

# Parameters
eps = 0.25
minPts = 5

# Pick a starting point
start_idx = 100
start_point = X[start_idx]

# Find neighbors within epsilon
nbrs = NearestNeighbors(radius=eps)
nbrs.fit(X)
neighbors = nbrs.radius_neighbors([start_point], return_distance=False)[0]

# Final DBSCAN result
db = DBSCAN(eps=eps, min_samples=minPts)
labels = db.fit_predict(X)

# -----------------------
# Set up figure
# -----------------------
fig, ax = plt.subplots(figsize=(6, 5))

def draw_base():
    ax.clear()
    ax.set_xlim(X[:, 0].min() - 0.5, X[:, 0].max() + 0.5)
    ax.set_ylim(X[:, 1].min() - 0.5, X[:, 1].max() + 0.5)
    ax.set_aspect("equal")

def update(frame):
    draw_base()

    if frame == 0:
        ax.scatter(X[:, 0], X[:, 1], alpha=0.7)
        ax.set_title("Step 1: Original Data")

    elif frame == 1:
        ax.scatter(X[:, 0], X[:, 1], alpha=0.7)
        ax.scatter(start_point[0], start_point[1], s=120, marker='x')
        ax.set_title("Step 2: Select an Unvisited Point")

    elif frame == 2:
        ax.scatter(X[:, 0], X[:, 1], alpha=0.7)
        ax.scatter(start_point[0], start_point[1], s=120, marker='x')
        circle = plt.Circle((start_point[0], start_point[1]), eps, fill=False, linewidth=2)
        ax.add_patch(circle)
        ax.set_title("Step 3: Find Points Within ε")

    elif frame == 3:
        ax.scatter(X[:, 0], X[:, 1], alpha=0.25)
        ax.scatter(X[neighbors, 0], X[neighbors, 1], alpha=0.95)
        ax.scatter(start_point[0], start_point[1], s=120, marker='x')
        circle = plt.Circle((start_point[0], start_point[1]), eps, fill=False, linewidth=2)
        ax.add_patch(circle)
        ax.set_title(f"Step 4: Neighborhood Size = {len(neighbors)}")

    elif frame == 4:
        ax.scatter(X[:, 0], X[:, 1], alpha=0.25)
        ax.scatter(X[neighbors, 0], X[neighbors, 1], alpha=0.95)
        ax.scatter(start_point[0], start_point[1], s=120, marker='x')
        circle = plt.Circle((start_point[0], start_point[1]), eps, fill=False, linewidth=2)
        ax.add_patch(circle)

        if len(neighbors) >= minPts:
            ax.set_title(f"Step 5: Core Point ({len(neighbors)} ≥ {minPts}) → Start Cluster")
        else:
            ax.set_title(f"Step 5: Not a Core Point ({len(neighbors)} < {minPts}) → Noise/Border")

    elif frame == 5:
        unique_labels = sorted(set(labels))
        for label in unique_labels:
            if label == -1:
                ax.scatter(
                    X[labels == label, 0],
                    X[labels == label, 1],
                    color='black',
                    label='Noise'
                )
            else:
                ax.scatter(
                    X[labels == label, 0],
                    X[labels == label, 1],
                    label=f'Cluster {label}'
                )
        ax.legend()
        ax.set_title("Final DBSCAN Clustering Result")

anim = FuncAnimation(fig, update, frames=6, interval=1800, repeat=True)

plt.close(fig)
HTML(anim.to_jshtml())

Why use DBSCAN?#

Does not require specifying the number of clusters in advance
Can detect outliers
Works well for arbitrarily shaped clusters

##Limitation Sensitive to parameter choices such as eps and min_samples

Comparing the Methods#

K-Means

Best for compact, spherical clusters
Requires choosing K

Hierarchical Clustering

Builds a tree-like clustering structure
Useful for exploring relationships between clusters

DBSCAN

Density-based
Handles noise and irregular cluster shapes better

Beyond K-Means

Contents

Beyond K-Means#

Hierarchical Clustering#

Key Idea#

(1) Agglomerative (Bottom-Up): How it works:#

Algorithm Steps:#

(2) Divisive (Top-Down): How it works:#

Algorithm Steps:#

Why Use Hierarchical Clustering?#

Limitation#

DBSCAN#

Main Idea#

Key Parameters:#

Types of Points#

How DBSCAN Works#

Why use DBSCAN?#

Comparing the Methods#