Bias-Variance Tradeoff#

A key concept in regression (and ML in general):

  • High Bias → Model is too simple → Underfitting

  • High Variance → Model is too complex → Overfitting

Goal:#

Find a balance where the model generalizes well to unseen data.

Overfitting and Regularization#

Overfitting#

Occurs when the model:

  • Fits training data extremely well

  • Fails on new (test) data

Solutions#

  1. Reduce model complexity

  2. Use fewer features

  3. Apply regularization

Regularization Techniques#

4. Regularized Regression (Controlling Overfitting)#

As models become more complex (especially with many features), they may start overfitting.

Regularization helps control this by penalizing large coefficients.

Ridge Regression (L2)#

  • Adds penalty on squared coefficients

\[\lambda \sum w_i^2\]
  • Shrinks coefficients; Keeps all features

    • Shrinks weights but does not remove features

Helps when:

  • Many features exist

  • Features are slightly correlated

Lasso Regression (L1)#

Adds penalty on absolute coefficients:

\[\lambda \sum |w_i|\]
  • Can shrink some coefficients to exactly zero

This makes it useful for:

  • Feature selection

  • Simplifying models

Elastic Net#

  • Combines Ridge and Lasso

  • Useful when features are highly correlated


## Implementation of Ridge and Lasso
from sklearn.linear_model import Ridge, Lasso

ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)

Ridge and Lasso
from sklearn.linear_model import Ridge, Lasso

ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)

Worked Example (Conceptual)#

Suppose we have:

Size (sq ft)

Price ($)

1000

200,000

1500

300,000

2000

400,000

We try to fit a line:

\[Price = m \cdot Size + b\]

The model finds:

  • A slope \(m\) that captures how price changes with size

  • An intercept \(b\) that anchors the line

Once trained, we can predict:

  • Price of a 1800 sq ft house

  • Price of unseen data points

Applications of Regression#

Regression is widely used across domains:

  • Finance: stock price prediction, risk modeling

  • Marketing: sales forecasting

  • Healthcare: predicting disease progression

  • Economics: demand estimation

  • Engineering: system modeling