What is Principal Component Analysis (PCA)?#

Principal Component Analysis (PCA) is an unsupervised feature extraction technique used for dimensionality reduction, where new features are created to preserve the most important information in the data. PCA:

  • Reduces the number of features

  • Preserves most of the variance

  • Transforms correlated variables into a smaller set of uncorrelated principal components (PCs)

Each principal component is:

  • A linear combination of original features

  • Uncorrelated with others

  • Ordered by importance (variance captured)

PCA: Looking at Data from a Better Angle#

How can we represent data in fewer dimensions while losing the least information?

Imagine data as a cloud of points. By rotating our view, we can find a direction where the data spreads the most.

This direction is PC1 (captures maximum variance)

Next, PCA finds another direction that:

  • Is perpendicular (orthogonal) to PC1

  • Captures the next highest variance

This becomes PC2

Figure: PCA rotates axes to align with directions of maximum variance. Source: Machine Learning Plus

Each additional component captures the next most important variation while remaining orthogonal to previous ones.

Why This Idea Works#

The key intuition behind PCA is:

Variance represents information

  • Low variance → little information

  • High variance → meaningful patterns

PCA keeps high-variance directions and removes low-variance noise.

Removing Redundancy#

Real-world features are often correlated (e.g., income & spending).

PCA:

  • Combines correlated features

  • Produces uncorrelated components

  • Removes redundancy

The Underlying Mechanism#

PCA is based on three key ideas:

  • Covariance matrix → captures relationships between features

  • Eigenvectors → directions of maximum variance

  • Eigenvalues → importance of each direction

These define a new coordinate system aligned with the most informative directions.

PCA rotates the data to align with its most informative directions, allowing us to reduce dimensions while preserving structure. –>

Mathematical Idea (Core of PCA)#

\[ Z = XW \]

Where:

  • (X) = original data matrix

  • (W) = matrix of eigenvectors (principal directions)

  • (Z) = transformed data (principal components)

PCA projects data onto directions where variance is maximized.

Principal Components, Eigenvectors, and Eigenvalues#

Each principal component is a linear combination of original features.

Example:

\[ PC_1 = v_{11}X_1 + v_{12}X_2 + v_{13}X_3 \]
\[ PC_2 = v_{21}X_1 + v_{22}X_2 + v_{23}X_3 \]

Where:

  • v = weights (from eigenvectors)

  • X = original features

These weights tell us how much each original feature contributes to a given principal component.

Eigenvectors define the directions (principal components)
Eigenvalues measure how much variance each component captures

  • Larger eigenvalue = more important component

Figure: PCA projects data onto new axes that capture maximum variance. Source: Medium

Key Properties of Principal Components#

  • Principal components are orthogonal, meaning they are uncorrelated

  • PC1 captures the maximum variance

  • PC2 captures the next highest variance

  • Each additional component captures less variance than the previous one

Step-by-Step: How PCA Works#

Conceptually, PCA follows these steps:

  1. Standardize the data
    so that all features contribute equally

  2. Compute the covariance matrix
    to understand how features vary together

  3. Find eigenvectors and eigenvalues

    • Eigenvectors give the directions

    • Eigenvalues give the importance of those directions

  4. Sort the components by importance
    so the directions with the highest variance come first

  5. Select the top (k) components
    based on how much variance you want to preserve

  6. Project the data into the new space
    to obtain a lower-dimensional representation

Figure: PCA identifies new axes (principal components) that align with directions of maximum variance in the data. Source: Devopedia.

In this chapter, we will use a Irish dataset throughout having 150 samples and 4 features; small enough to inspect, structured enough to be interesting. Later we will also see how to implement PCA simply using python sklearn library.