Types of Feature Engineering

Types of Feature Engineering#

Feature engineering generally involves three major types of operations:

Feature Creation
Feature Transformation
Feature Selection

Each type plays a different role in improving the quality of input data for machine learning models.

1. Feature Creation#

Feature creation involves generating new variables from existing data. These new variables may capture relationships that the original features do not directly represent.

Examples:

Original Features	Engineered Feature
height, weight	BMI = weight / height²
purchase_count, visits	average_purchase_value
date	day_of_week, month, is_weekend

Example: Housing Dataset#

Original features:

Size_sqft	Bedrooms	Price
1500	3	450000

Engineered features:

price_per_sqft
bedrooms_per_sqft
house_age

These engineered features may better capture housing patterns.

import pandas as pd

df = pd.DataFrame({
    "size_sqft":[1500,1800,1200],
    "price":[450000,520000,350000]
})

df["price_per_sqft"] = df["price"] / df["size_sqft"]

df

	size_sqft	price	price_per_sqft
0	1500	450000	300.000000
1	1800	520000	288.888889
2	1200	350000	291.666667

Remember, Feature creation often uses domain knowledge to design useful variables.

Types of Feature Creation:#

Common creation techniques include:

Polynomial Features
Interaction Features

1a. Polynomial Features#

Polynomial features are created by raising existing features to a power. They allow models to capture nonlinear relationships between variables.

For example, suppose we are predicting house prices using the size of a house.

Size (sqft)	Price
1000	200000
1500	300000
2000	450000

The relationship between size and price may not be perfectly linear. To capture nonlinear patterns, we can create polynomial features:

\[ x^2, x^3, ... \]

Example:#

Size	Size²
1000	1,000,000
1500	2,250,000

These new variables allow models to learn curved relationships.

Example Python Code#

from sklearn.preprocessing import PolynomialFeatures
import pandas as pd

df = pd.DataFrame({
    "size":[1000,1500,2000]
})

poly = PolynomialFeatures(degree=2, include_bias=False)

poly_features = poly.fit_transform(df)

pd.DataFrame(poly_features, columns=["size","size_squared"])

Polynomial features are commonly used with models such as:

Linear Regression
Logistic Regression

They allow simple models to capture complex patterns.

1b. Interaction Features#

Interaction features are created by multiplying two or more features together. They capture relationships where the combined effect of multiple variables matters.

Example#

Suppose we want to predict house prices using:

house size
neighborhood quality score

Size	Neighborhood Score
1500	8
1500	4

Two houses may have the same size, but if one is in a better neighborhood, the price may be higher. An interaction feature can capture this relationship.

\[ Interaction = Size \times NeighborhoodScore \]

Example Table#

Size	Neighborhood	Size × Neighborhood
1500	8	12000
1500	4	6000

This feature helps the model understand that price depends on both variables together.

Example Python Code#

df["size_neighborhood_interaction"] = df["size"] * df["neighborhood_score"]

Why These Features Matter?#

Polynomial and interaction features help models capture complex relationships in data.

Polynomial Features Capture nonlinear patterns
Interaction Features Capture relationships between multiple variables

These techniques are widely used in:

regression models
recommendation systems
predictive analytics

Types of Feature Engineering

Contents

Types of Feature Engineering#

1. Feature Creation#

Example: Housing Dataset#

Types of Feature Creation:#

1a. Polynomial Features#

Example:#

Example Python Code#

1b. Interaction Features#

Example#

Example Table#

Example Python Code#

Why These Features Matter?#