Types of Feature Engineering#

Feature engineering generally involves three major types of operations:

  • Feature Creation

  • Feature Transformation

  • Feature Selection

Each type plays a different role in improving the quality of input data for machine learning models.

1. Feature Creation#

Feature creation involves generating new variables from existing data. These new variables may capture relationships that the original features do not directly represent.

Examples:

Original Features

Engineered Feature

height, weight

BMI = weight / height²

purchase_count, visits

average_purchase_value

date

day_of_week, month, is_weekend

Example: Housing Dataset#

Original features:

Size_sqft

Bedrooms

Price

1500

3

450000

Engineered features:

  • price_per_sqft

  • bedrooms_per_sqft

  • house_age

These engineered features may better capture housing patterns.

import pandas as pd

df = pd.DataFrame({
    "size_sqft":[1500,1800,1200],
    "price":[450000,520000,350000]
})

df["price_per_sqft"] = df["price"] / df["size_sqft"]

df
size_sqft price price_per_sqft
0 1500 450000 300.000000
1 1800 520000 288.888889
2 1200 350000 291.666667

Remember, Feature creation often uses domain knowledge to design useful variables.

Types of Feature Creation:#

Common creation techniques include:

  • Polynomial Features

  • Interaction Features

1a. Polynomial Features#

Polynomial features are created by raising existing features to a power. They allow models to capture nonlinear relationships between variables.

For example, suppose we are predicting house prices using the size of a house.

Size (sqft)

Price

1000

200000

1500

300000

2000

450000

The relationship between size and price may not be perfectly linear. To capture nonlinear patterns, we can create polynomial features:

\[ x^2, x^3, ... \]

Example:#

Size

Size²

1000

1,000,000

1500

2,250,000

These new variables allow models to learn curved relationships.

Example Python Code#

from sklearn.preprocessing import PolynomialFeatures
import pandas as pd

df = pd.DataFrame({
    "size":[1000,1500,2000]
})

poly = PolynomialFeatures(degree=2, include_bias=False)

poly_features = poly.fit_transform(df)

pd.DataFrame(poly_features, columns=["size","size_squared"])

Polynomial features are commonly used with models such as:

  • Linear Regression

  • Logistic Regression

They allow simple models to capture complex patterns.

1b. Interaction Features#

Interaction features are created by multiplying two or more features together. They capture relationships where the combined effect of multiple variables matters.

Example#

Suppose we want to predict house prices using:

  • house size

  • neighborhood quality score

Size

Neighborhood Score

1500

8

1500

4

Two houses may have the same size, but if one is in a better neighborhood, the price may be higher. An interaction feature can capture this relationship.

\[ Interaction = Size \times NeighborhoodScore \]

Example Table#

Size

Neighborhood

Size × Neighborhood

1500

8

12000

1500

4

6000

This feature helps the model understand that price depends on both variables together.

Example Python Code#

df["size_neighborhood_interaction"] = df["size"] * df["neighborhood_score"]

Why These Features Matter?#

Polynomial and interaction features help models capture complex relationships in data.

  • Polynomial Features Capture nonlinear patterns

  • Interaction Features Capture relationships between multiple variables

These techniques are widely used in:

  • regression models

  • recommendation systems

  • predictive analytics