Types of Regression#

As we move from simple problems to more complex ones, different types of regression models are used depending on the nature of the data and the relationship.

1. Simple Linear Regression#

The simplest form of regression is linear regression with one variable, where we assume a straight-line relationship between input and output.

we model the relationship between one feature and the target using a straight line. Example: Predicting salary based on years of experience

  • One input variable

  • Assumes a linear relationship

  • Easy to interpret

The model is written as:

\[y = mx + b\]

Where:

  • \(y\) is the predicted value

  • \(x\) is the input feature

  • \(m\) is the slope (how much \(y\) changes with \(x\))

  • \(b\) is the intercept (value of \(y\) when \(x=0\))

A Simple View of Linear Regression. Source:Medium.com

Intuition#

Think of this as fitting the “best possible straight line” through your data points. The model tries to capture the overall trend in the data.

For example:

  • If house size increases, price usually increases → positive slope

  • If study hours increase, exam score increases → positive slope

Key Insight: The model does not try to pass through every point. Instead, it minimizes the overall error across all points.

2. Multiple Linear Regression#

In reality, most outcomes depend on multiple factors, a single feature is rarely sufficient. Multiple linear regression extends the idea of a straight line to multiple dimensions.

  • Multiple input variables

  • Each feature contributes to the prediction

  • Helps capture more realistic scenarios

Example: Predicting house price depends on:

  • Size

  • Number of bedrooms

  • Location

  • Age of the property

This leads to multiple linear regression, where we use several features:

\[y = b_0 + b_1x_1 + b_2x_2 + \dots + b_nx_n\]

Where:

  • \(n\) is the degree of polynomial

  • \(b_0\) is the intercept

  • \(b_1, b_2, \dots, b_n\) are coefficients

  • Each \(x_i\) is a feature

Each feature contributes independently to the prediction.

Interpretation#

One of the most powerful aspects of multiple regression is interpretation:

  • \(b_1\): effect of \(x_1\) while keeping all other variables constant

  • \(b_2\): effect of \(x_2\), and so on

This allows us to answer questions like:

  • “How much does price increase per extra bedroom, holding size constant?”

Important Note#

Features should not be highly correlated with each other (multicollinearity), as it can make interpretation unstable.

3. Non-Linear Relationships and Polynomial Regression#

Not all relationships are linear. Sometimes the data curves Polynomial regression allows the model to fit curves instead of straight lines.

  • Captures non-linear patterns

  • Still based on linear modeling techniques

  • Risk of overfitting if degree is too high

Example:

  • Growth may accelerate over time

  • Costs may increase at an increasing rate

To handle this, we use polynomial regression:

\[ y = a + bx + cx^2 + dx^3 + \dots\]

A Simple View of Polynomial Regression. Source: Statisticalaid.com

Even though the curve looks non-linear, the model is still linear in parameters, which allows us to train it using the same techniques as linear regression.

Tradeoff#

  • Low degree → underfitting

  • High degree → overfitting

Choosing the right degree is critical.

Python Implementation#

In practice, regression is often implemented using libraries such as:


### Basic Linear Regression

```python
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# train model
model = LinearRegression()
model.fit(X_train, y_train)

# predictions
predictions = model.predict(X_test)


### Basic Polynomial Regression

from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

model = LinearRegression()
model.fit(X_poly, y)