Applying Aggregation Functions Directly to a DataFrame#

One of the strengths of Pandas is that you can apply statistical and aggregation methods directly to a DataFrame or Series. These methods summarize data and provide insights without needing extra loops or manual calculations.

Common Aggregation Methods#

Here are some of the most commonly used methods:

Method

Description

Works On

.sum()

Returns the sum of values

DataFrame / Series

.mean()

Returns the average (mean) value

DataFrame / Series

.count()

Counts non-null values

DataFrame / Series

.min()

Returns the minimum value

DataFrame / Series

.max()

Returns the maximum value

DataFrame / Series

.std()

Returns the standard deviation

DataFrame / Series

.var()

Returns the variance

DataFrame / Series

.describe()

Generates summary statistics (count, mean, std, min, quartiles, max)

DataFrame / Series

Example: Aggregating a Series

import pandas as pd

# Salary data
salaries = pd.Series([50000, 60000, 55000, 65000, 70000])

print("Sum:", salaries.sum())
print("Mean:", salaries.mean())
print("Max:", salaries.max())
print("Std Dev:", salaries.std())
Sum: 300000
Mean: 60000.0
Max: 70000
Std Dev: 7905.694150420948

Each method is applied directly to the Series, returning a single value.

Example: Aggregating a DataFrame

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [24, 30, 28],
    'Salary': [50000, 60000, 55000]
}
df = pd.DataFrame(data)

print(df.sum(numeric_only=True))   # Sum of numeric columns
print(df.mean(numeric_only=True))  # Mean of numeric columns
print(df.describe())
Age           82
Salary    165000
dtype: int64
Age          27.333333
Salary    55000.000000
dtype: float64
             Age   Salary
count   3.000000      3.0
mean   27.333333  55000.0
std     3.055050   5000.0
min    24.000000  50000.0
25%    26.000000  52500.0
50%    28.000000  55000.0
75%    29.000000  57500.0
max    30.000000  60000.0

Notice how these functions automatically ignore non-numeric columns (like “Name”).

More Advanced: Filtering Data & Apply Statistical Functions#

We can combine row filtering with aggregation functions to analyze subsets of a DataFrame.

The general syntax is:

df[df[‘column_name’] value][‘target_column’].function()

where:

  • df[…] → filters the rows that meet the condition

  • [‘target_column’] → selects the column to aggregate

  • .function() → applies the aggregation function

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 35, 28, 40],
    'Salary': [50000, 66000, 55000, 70000]
}
df = pd.DataFrame(data)

# Average salary of employees older than 30
avg_salary = df[df['Age'] > 30]['Salary'].mean()
print(avg_salary)

# Maximum salary for employees younger than 30
df[df['Age'] < 30]['Salary'].max()

# Count employees with salary above 60,000
df[df['Salary'] > 60000]['Name'].count()

# Standard deviation of salary for people aged 25–40
df[(df['Age'] >= 25) & (df['Age'] <= 40)]['Salary'].std()
68000.0
np.float64(7767.45346515403)

So the syntax pattern is:

df[ df[‘condition’] ][‘column’].aggregation()

Expression

Meaning

df[df['Age'] > 30]['Salary'].mean()

Mean of Salary where Age > 30

df[df['Salary'] > 60000]['Name'].count()

Count of employees with Salary > 60k

df[(df['Age'] >= 25) & (df['Age'] <= 40)]['Salary'].std()

Standard deviation of Salary for 25–40 year olds

This pattern allows you to filter data first, then aggregate only on the rows that meet your condition.