Assumptions of Linear Regression (L.I.N.E)#
For linear regression to produce reliable and interpretable results, certain assumptions must hold.
A helpful way to remember them is L.I.N.E:
L — Linearity
I — Independence of errors
N — Normality of errors
E — Equal variance (homoscedasticity)
1. Linearity#
The relationship between the input variables and the target should be linear.
If the true relationship is curved, a linear model will underfit the data.
2. Independence of Errors#
Residuals (errors) should not be correlated with each other.
This is especially important in time-series data, where errors can depend on previous observations.
3. Normality of Errors#
Residuals should be approximately normally distributed.
This assumption is particularly important for:
statistical inference
confidence intervals
hypothesis testing
4. Equal Variance (Homoscedasticity)#
The spread of residuals should remain constant across all values of the input.
If variance changes (heteroscedasticity), predictions may become unreliable.
No Multicollinearity (for Multiple Regression)#
Input features should not be highly correlated with each other.
High correlation between features can:
distort coefficient estimates
make interpretation unstable
Why Assumptions Matter#
When these assumptions are violated:
The model may still produce predictions
But interpretations (coefficients, significance) become unreliable
Confidence in results decreases
In practice:
Always check residual plots
Look for patterns, trends, or changing variance
Summary: For linear regression to work well, certain assumptions are made:
Linearity
The relationship between input and output is linear.Independence of errors
Errors are not correlated with each other.Homoscedasticity
The variance of errors is constant across all values of \(x\).Normality of errors
Errors are normally distributed.No multicollinearity (important for multiple regression)
Features should not be highly correlated with each other.
Violating these assumptions may lead to unreliable predictions.