Assumptions of Linear Regression (L.I.N.E)

Assumptions of Linear Regression (L.I.N.E)#

For linear regression to produce reliable and interpretable results, certain assumptions must hold.

A helpful way to remember them is L.I.N.E:

The relationship between the input variables and the target should be linear.

If the true relationship is curved, a linear model will underfit the data.

Residuals (errors) should not be correlated with each other.

This is especially important in time-series data, where errors can depend on previous observations.

Residuals should be approximately normally distributed.

This assumption is particularly important for:

The spread of residuals should remain constant across all values of the input.

If variance changes (heteroscedasticity), predictions may become unreliable.

Input features should not be highly correlated with each other.

High correlation between features can:

When these assumptions are violated:

In practice:

Summary: For linear regression to work well, certain assumptions are made:

Linearity
The relationship between input and output is linear.
Independence of errors
Errors are not correlated with each other.
Homoscedasticity
The variance of errors is constant across all values of \(x\).
Normality of errors
Errors are normally distributed.
No multicollinearity (important for multiple regression)
Features should not be highly correlated with each other.

Violating these assumptions may lead to unreliable predictions.