Bias-Variance Tradeoff#
A key concept in regression (and ML in general):
High Bias → Model is too simple → Underfitting
High Variance → Model is too complex → Overfitting
Goal:#
Find a balance where the model generalizes well to unseen data.
Overfitting and Regularization#
Overfitting#
Occurs when the model:
Fits training data extremely well
Fails on new (test) data
Solutions#
Reduce model complexity
Use fewer features
Apply regularization
Regularization Techniques#
4. Regularized Regression (Controlling Overfitting)#
As models become more complex (especially with many features), they may start overfitting.
Regularization helps control this by penalizing large coefficients.
Ridge Regression (L2)#
Adds penalty on squared coefficients
Shrinks coefficients; Keeps all features
Shrinks weights but does not remove features
Helps when:
Many features exist
Features are slightly correlated
Lasso Regression (L1)#
Adds penalty on absolute coefficients:
Can shrink some coefficients to exactly zero
This makes it useful for:
Feature selection
Simplifying models
Elastic Net#
Combines Ridge and Lasso
Useful when features are highly correlated
## Implementation of Ridge and Lasso
from sklearn.linear_model import Ridge, Lasso
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)
Ridge and Lasso
from sklearn.linear_model import Ridge, Lasso
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)
Worked Example (Conceptual)#
Suppose we have:
Size (sq ft) |
Price ($) |
|---|---|
1000 |
200,000 |
1500 |
300,000 |
2000 |
400,000 |
We try to fit a line:
The model finds:
A slope \(m\) that captures how price changes with size
An intercept \(b\) that anchors the line
Once trained, we can predict:
Price of a 1800 sq ft house
Price of unseen data points
Applications of Regression#
Regression is widely used across domains:
Finance: stock price prediction, risk modeling
Marketing: sales forecasting
Healthcare: predicting disease progression
Economics: demand estimation
Engineering: system modeling