Applying Aggregation Functions Directly to a DataFrame

Applying Aggregation Functions Directly to a DataFrame#

One of the strengths of Pandas is that you can apply statistical and aggregation methods directly to a DataFrame or Series. These methods summarize data and provide insights without needing extra loops or manual calculations.

Common Aggregation Methods#

Here are some of the most commonly used methods:

Method	Description	Works On
`.sum()`	Returns the sum of values	DataFrame / Series
`.mean()`	Returns the average (mean) value	DataFrame / Series
`.count()`	Counts non-null values	DataFrame / Series
`.min()`	Returns the minimum value	DataFrame / Series
`.max()`	Returns the maximum value	DataFrame / Series
`.std()`	Returns the standard deviation	DataFrame / Series
`.var()`	Returns the variance	DataFrame / Series
`.describe()`	Generates summary statistics (count, mean, std, min, quartiles, max)	DataFrame / Series

Example: Aggregating a Series

import pandas as pd

# Salary data
salaries = pd.Series([50000, 60000, 55000, 65000, 70000])

print("Sum:", salaries.sum())
print("Mean:", salaries.mean())
print("Max:", salaries.max())
print("Std Dev:", salaries.std())

Sum: 300000
Mean: 60000.0
Max: 70000
Std Dev: 7905.694150420948

Each method is applied directly to the Series, returning a single value.

Example: Aggregating a DataFrame

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [24, 30, 28],
    'Salary': [50000, 60000, 55000]
}
df = pd.DataFrame(data)

print(df.sum(numeric_only=True))   # Sum of numeric columns
print(df.mean(numeric_only=True))  # Mean of numeric columns
print(df.describe())

Age           82
Salary    165000
dtype: int64
Age          27.333333
Salary    55000.000000
dtype: float64
             Age   Salary
count   3.000000      3.0
mean   27.333333  55000.0
std     3.055050   5000.0
min    24.000000  50000.0
25%    26.000000  52500.0
50%    28.000000  55000.0
75%    29.000000  57500.0
max    30.000000  60000.0

Notice how these functions automatically ignore non-numeric columns (like “Name”).

More Advanced: Filtering Data & Apply Statistical Functions#

We can combine row filtering with aggregation functions to analyze subsets of a DataFrame.

The general syntax is:

df[df[‘column_name’] value][‘target_column’].function()

where:

df[…] → filters the rows that meet the condition
[‘target_column’] → selects the column to aggregate
.function() → applies the aggregation function

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 35, 28, 40],
    'Salary': [50000, 66000, 55000, 70000]
}
df = pd.DataFrame(data)

# Average salary of employees older than 30
avg_salary = df[df['Age'] > 30]['Salary'].mean()
print(avg_salary)

# Maximum salary for employees younger than 30
df[df['Age'] < 30]['Salary'].max()

# Count employees with salary above 60,000
df[df['Salary'] > 60000]['Name'].count()

# Standard deviation of salary for people aged 25–40
df[(df['Age'] >= 25) & (df['Age'] <= 40)]['Salary'].std()

68000.0

np.float64(7767.45346515403)

So the syntax pattern is:

df[ df[‘condition’] ][‘column’].aggregation()

Expression	Meaning
`df[df['Age'] > 30]['Salary'].mean()`	Mean of Salary where Age > 30
`df[df['Salary'] > 60000]['Name'].count()`	Count of employees with Salary > 60k
`df[(df['Age'] >= 25) & (df['Age'] <= 40)]['Salary'].std()`	Standard deviation of Salary for 25–40 year olds

This pattern allows you to filter data first, then aggregate only on the rows that meet your condition.

Applying Aggregation Functions Directly to a DataFrame

Contents

Applying Aggregation Functions Directly to a DataFrame#

Common Aggregation Methods#

More Advanced: Filtering Data & Apply Statistical Functions#