Exploring Relationships Between Variables#

So far, we have examined variables individually; their distributions, typical values, and unusual observations. The next step in exploratory analysis is to understand how variables interact.

Relationships between variables reveal structure in the data. They help us identify patterns, associations, and possible predictive signals.

However, relationships must be interpreted carefully. Association does not imply causation.

Visualizing Relationships#

The most common way to examine relationships between two numerical variables is a scatter plot.
Each point represents an observation, positioned according to its values on two variables.

import matplotlib.pyplot as plt
import seaborn as sns

sns.scatterplot(data=df, x="feature1", y="feature2")
plt.title("Relationship between feature1 and feature2")
plt.show()

What to look for:

  • upward trend → positive association

  • downward trend → negative association

  • clusters → subgroups

  • curved pattern → nonlinear relationship

  • no pattern → weak or no relationship

Comparing Groups#

When one variable is categorical and the other numeric, we compare distributions across groups.

sns.boxplot(data=df, x="category", y="numeric_value")
plt.title("Distribution across categories")
plt.show()

This reveals whether different groups behave differently.

Measuring Correlation#

Correlation quantifies the strength and direction of a linear relationship between numeric variables.

df[["feature1", "feature2"]].corr()

Correlation values range from:

  • +1 → perfect positive relationship

  • 0 → no linear relationship

  • −1 → perfect negative relationship

Remember: correlation measures association, not cause.

Multivariate Exploration#

Real-world phenomena rarely depend on only two variables. Multivariate exploration examines multiple variables simultaneously.

This helps reveal interactions, hidden groupings, and complex structure that pairwise analysis may miss.

Pairwise Relationships#

A pairplot visualizes relationships among many numeric variables at once.

sns.pairplot(df.select_dtypes(include="number"))

This shows:

  • distributions on the diagonal

  • pairwise scatter plots elsewhere

Patterns across multiple variables become visible.

Correlation Heatmap#

A heatmap provides a compact visual summary of correlations across many variables.

plt.figure(figsize=(8,6))
sns.heatmap(df.corr(numeric_only=True), annot=True, cmap="coolwarm")
plt.title("Correlation Heatmap")
plt.show()

This helps identify:

  • strongly related variables

  • redundant features

  • potential predictors

Why Multivariate Exploration Matters#

Many patterns only appear when several variables are considered together.
Multivariate analysis helps uncover interactions and structure that single-variable or pairwise analysis cannot reveal.