What is Principal Component Analysis (PCA)?

What is Principal Component Analysis (PCA)?#

Principal Component Analysis (PCA) is an unsupervised feature extraction technique used for dimensionality reduction, where new features are created to preserve the most important information in the data. PCA:

Reduces the number of features
Preserves most of the variance
Transforms correlated variables into a smaller set of uncorrelated principal components (PCs)

Each principal component is:

A linear combination of original features
Uncorrelated with others
Ordered by importance (variance captured)

PCA: Looking at Data from a Better Angle#

How can we represent data in fewer dimensions while losing the least information?

Imagine data as a cloud of points. By rotating our view, we can find a direction where the data spreads the most.

This direction is PC1 (captures maximum variance)

Next, PCA finds another direction that:

Is perpendicular (orthogonal) to PC1
Captures the next highest variance

This becomes PC2

Figure: PCA rotates axes to align with directions of maximum variance. _{Source: Machine Learning Plus}

Each additional component captures the next most important variation while remaining orthogonal to previous ones.

Why This Idea Works#

The key intuition behind PCA is:

Variance represents information

Low variance → little information
High variance → meaningful patterns

PCA keeps high-variance directions and removes low-variance noise.

Removing Redundancy#

Real-world features are often correlated (e.g., income & spending).

PCA:

Combines correlated features
Produces uncorrelated components
Removes redundancy

The Underlying Mechanism#

PCA is based on three key ideas:

Covariance matrix → captures relationships between features
Eigenvectors → directions of maximum variance
Eigenvalues → importance of each direction

These define a new coordinate system aligned with the most informative directions.

PCA rotates the data to align with its most informative directions, allowing us to reduce dimensions while preserving structure. –>

Mathematical Idea (Core of PCA)#

\[ Z = XW \]

Where:

(X) = original data matrix
(W) = matrix of eigenvectors (principal directions)
(Z) = transformed data (principal components)

PCA projects data onto directions where variance is maximized.

Principal Components, Eigenvectors, and Eigenvalues#

Each principal component is a linear combination of original features.

Example:

\[ PC_1 = v_{11}X_1 + v_{12}X_2 + v_{13}X_3 \]

\[ PC_2 = v_{21}X_1 + v_{22}X_2 + v_{23}X_3 \]

Where:

v = weights (from eigenvectors)
X = original features

These weights tell us how much each original feature contributes to a given principal component.

Eigenvectors define the directions (principal components)
Eigenvalues measure how much variance each component captures

Larger eigenvalue = more important component

Figure: PCA projects data onto new axes that capture maximum variance. _{Source: Medium}

Key Properties of Principal Components#

Principal components are orthogonal, meaning they are uncorrelated
PC1 captures the maximum variance
PC2 captures the next highest variance
Each additional component captures less variance than the previous one

Step-by-Step: How PCA Works#

Conceptually, PCA follows these steps:

Standardize the data
so that all features contribute equally
Compute the covariance matrix
to understand how features vary together
Find eigenvectors and eigenvalues
- Eigenvectors give the directions
- Eigenvalues give the importance of those directions
Sort the components by importance
so the directions with the highest variance come first
Select the top (k) components
based on how much variance you want to preserve
Project the data into the new space
to obtain a lower-dimensional representation

Figure: PCA identifies new axes (principal components) that align with directions of maximum variance in the data. Source: Devopedia.

In this chapter, we will use a Irish dataset throughout having 150 samples and 4 features; small enough to inspect, structured enough to be interesting. Later we will also see how to implement PCA simply using python sklearn library.