Evaluation Metrics

Evaluation Metrics#

To evaluate regression models, we use several metrics:

Mean Squared Error (MSE)#

\[MSE = \frac{1}{n} \sum (y_i - \hat{y}_i)^2\]

Penalizes large errors more
Commonly used in optimization

Why Mean Squared Error (MSE)?#

MSE is widely used because:

It strongly penalizes large mistakes
It leads to a smooth optimization surface
It has a closed-form solution (OLS)
It works well with gradient-based optimization

Alternative Error Measures (Brief Insight)#

Although MSE is standard, other options exist:

MAE (Mean Absolute Error) → treats all errors equally
RMSE → same unit as target variable

Each metric emphasizes different aspects of model performance.

Mean Absolute Error (MAE)#

\[MAE = \frac{1}{n} \sum |y_i - \hat{y}_i|\]

Easy to interpret
Treats all errors equally

Root Mean Squared Error (RMSE)#

\[RMSE = \sqrt{MSE}\]

Same unit as the target variable
Easier to interpret than MSE

R-squared (R²)#

\[R^2 = 1 - \frac{SS_{res}}{SS_{tot}}\]

Measures how well the model explains variance
Value ranges from 0 to 1
- 1 → perfect fit
- 0 → no explanatory power (Model explains nothing (same as predicting the mean))

Understanding the Components#

Total Sum of Squares ((SS_{tot}))
$$ SS_{tot} = \sum (y_i - \bar{y})^2 $$

Captures the total variation in the data
Measures how far actual values are from the mean
Represents the baseline (no model)

Residual Sum of Squares ((SS_{res}))
$$ SS_{res} = \sum (y_i - \hat{y}_i)^2 $$

Captures the unexplained variation
Measures how far predictions are from actual values
Represents model error

Intuition#

(SS_{tot}): Total variability in data
(SS_{res}): Remaining error after modeling

\[ R^2 = \text{Explained Variation} = 1 - \frac{\text{Unexplained}}{\text{Total}} \]

Remember:

Smaller $SS_{res}$ → better fit → higher $R^2$
If $SS_{res} = 0$, then $R^2 = 1$ (perfect model)

Simple Takeaway#

R² tells us what fraction of the data’s variability (spread) is explained by the model.

# Evaluating the Model
from sklearn.metrics import mean_squared_error, r2_score

mse = mean_squared_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

print("MSE:", mse)
print("R2 Score:", r2