Note that, we can also apply a Function Elementwise with applymap() and to a Single Column with map() but not covering in this course.

Filtering Data in Pandas#

Once you know how to select columns and rows, the next step is learning how to filter data. Filtering helps you focus on only the relevant part of your dataset, whether that means removing unnecessary columns, isolating rows that meet certain conditions, or preparing features for modeling.

Filtering Columns#

Column filtering is about selecting only the columns you need or dropping the ones you don’t. This reduces memory usage and keeps your DataFrame manageable.

# Select a single column
df['Age']

# Select multiple columns
df[['Name', 'Age']]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 2
      1 # Select a single column
----> 2 df['Age']
      4 # Select multiple columns
      5 df[['Name', 'Age']]

NameError: name 'df' is not defined

Dropping Unused Columns#

# Drop the 'Age_squared' column
df = df.drop(columns=['Age_squared'])
print(df)
      Name  Age   Salary  Income_per_Age    Total
0    Alice   29  55000.0     1896.551724  55029.0
1      Bob   35  66000.0     1885.714286  66035.0
2  Charlie   33  60500.0     1833.333333  60533.0

This is especially useful when preparing data for machine learning, where only selected features are required.

Filtering Rows (using Boolean Indexing)#

Row filtering is usually done with Boolean indexing, where you apply a condition and return only the rows where that condition is true.

# Filter rows where Age > 30
df[df['Age'] > 30]
Name Age Salary Income_per_Age Total
1 Bob 35 66000.0 1885.714286 66035.0
2 Charlie 33 60500.0 1833.333333 60533.0

Combining Multiple Conditions#

You can combine conditions using & (and) or | (or).

# Filter rows where Age > 30 AND Salary > 60000
df[(df['Age'] > 30) & (df['Salary'] > 60000)]
Name Age Salary Income_per_Age Total
1 Bob 35 66000.0 1885.714286 66035.0
2 Charlie 33 60500.0 1833.333333 60533.0

Remember to wrap each condition in parentheses.

Filtering Strings#

You can filter rows where a text column contains specific values

# Filter rows where Name contains "Bob"
df[df['Name'].str.contains("Bob")]
Name Age Salary Income_per_Age Total
1 Bob 35 66000.0 1885.714286 66035.0

Unique Values and Counting#

Sometimes you want to check how many unique values a column has, or count how often each appears.

# Unique names
print(df['Name'].unique())

# Count frequency of each name
print(df['Name'].value_counts())
['Alice' 'Bob' 'Charlie']
Name
Alice      1
Bob        1
Charlie    1
Name: count, dtype: int64