What is Principal Component Analysis (PCA)?#
Principal Component Analysis (PCA) is an unsupervised feature extraction technique used for dimensionality reduction, where new features are created to preserve the most important information in the data. PCA:
Reduces the number of features
Preserves most of the variance
Transforms correlated variables into a smaller set of uncorrelated principal components (PCs)
Each principal component is:
A linear combination of original features
Uncorrelated with others
Ordered by importance (variance captured)
PCA: Looking at Data from a Better Angle#
How can we represent data in fewer dimensions while losing the least information?
Imagine data as a cloud of points. By rotating our view, we can find a direction where the data spreads the most.
This direction is PC1 (captures maximum variance)
Next, PCA finds another direction that:
Is perpendicular (orthogonal) to PC1
Captures the next highest variance
This becomes PC2
Figure: PCA rotates axes to align with directions of maximum variance. Source: Machine Learning Plus
Each additional component captures the next most important variation while remaining orthogonal to previous ones.
Why This Idea Works#
The key intuition behind PCA is:
Variance represents information
Low variance → little information
High variance → meaningful patterns
PCA keeps high-variance directions and removes low-variance noise.
Removing Redundancy#
Real-world features are often correlated (e.g., income & spending).
PCA:
Combines correlated features
Produces uncorrelated components
Removes redundancy
The Underlying Mechanism#
PCA is based on three key ideas:
Covariance matrix → captures relationships between features
Eigenvectors → directions of maximum variance
Eigenvalues → importance of each direction
These define a new coordinate system aligned with the most informative directions.
PCA rotates the data to align with its most informative directions, allowing us to reduce dimensions while preserving structure. –>
Mathematical Idea (Core of PCA)#
Where:
(X) = original data matrix
(W) = matrix of eigenvectors (principal directions)
(Z) = transformed data (principal components)
PCA projects data onto directions where variance is maximized.
Principal Components, Eigenvectors, and Eigenvalues#
Each principal component is a linear combination of original features.
Example:
Where:
v = weights (from eigenvectors)
X = original features
These weights tell us how much each original feature contributes to a given principal component.
Eigenvectors define the directions (principal components)
Eigenvalues measure how much variance each component captures
Larger eigenvalue = more important component
Figure: PCA projects data onto new axes that capture maximum variance. Source: Medium
Key Properties of Principal Components#
Principal components are orthogonal, meaning they are uncorrelated
PC1 captures the maximum variance
PC2 captures the next highest variance
Each additional component captures less variance than the previous one
Step-by-Step: How PCA Works#
Conceptually, PCA follows these steps:
Standardize the data
so that all features contribute equallyCompute the covariance matrix
to understand how features vary togetherFind eigenvectors and eigenvalues
Eigenvectors give the directions
Eigenvalues give the importance of those directions
Sort the components by importance
so the directions with the highest variance come firstSelect the top (k) components
based on how much variance you want to preserveProject the data into the new space
to obtain a lower-dimensional representation
Figure: PCA identifies new axes (principal components) that align with directions of maximum variance in the data. Source: Devopedia.
In this chapter, we will use a Irish dataset throughout having 150 samples and 4 features; small enough to inspect, structured enough to be interesting. Later we will also see how to implement PCA simply using python sklearn library.