Unsupervised Learning: Discovering Patterns Without Answers#
A Different Kind of Learning#
So far, most of our machine learning tasks follow a familiar pattern:
Inputs (features)
Outputs (labels)
The model learns a mapping from input → output. This is called supervised learning. But now consider a different situation.
You are given a dataset with many data points and features—but no labels. No correct answers. No categories. Just raw data. This raises an important question:
Can we learn something meaningful just from the structure of the data itself?
Figure: Unsupervised learning groups unlabeled data into patterns based on similarity. Source: MathWorks
This is where unsupervised learning begins.
2. The Problem of Too Many Features#
Consider analyzing customer behavior with features like:
Age
Income
Purchase frequency
Website activity
Now scale this to:
50, 100, or even 500 features
We quickly face challenges:
Visualization becomes impossible
Computation slows down
Many features are redundant or noisy
Relationships become hard to interpret
The data exists in a high-dimensional space, beyond our intuitive understanding. So we ask:
Can we simplify the data while preserving its important information?
Part A: Dimensionality Reduction#
Dimensionality reduction addresses this challenge.
It reduces the number of features while preserving the most important structure.
Think of it like:
Summarizing a long story
Compressing an image
Less data, but same essential meaning
One of the most powerful techniques for this is: Principal Component Analysis (PCA)
The Curse of Dimensionality#
Before PCA, we must understand why high dimensions are problematic.
The Curse of Dimensionality#
Figure: Hughes Phenomenon: Adding features helps initially, but too many features with limited data reduces performance. Source: Medium
As dimensions increase:
1. Data Becomes Sparse#
Points spread far apart, making the space mostly empty.
2. Distance Becomes Less Meaningful#
Nearest and farthest points become similar
Hard to distinguish similarity
Figure: Distances lose meaning as dimensionality increases. Source: Medium
3. More Data is Required#
High dimensions require exponentially more data to learn effectively.
4. Noise and Redundancy Increase#
Irrelevant features
Correlated features
Added noise
Intuition#
2D → easy to understand
3D → harder
100D → nearly impossible to reason about
High-dimensional space behaves very differently from what we expect.
Why This Matters#
Because of these effects:
Model performance can degrade
Computation becomes inefficient
Interpretation becomes difficult
This is why we use dimensionality reduction
Reduce complexity while preserving structure
And one of the most important tools for this is: PCA