Multiple Linear Regression
Core Titles
Key headlines and terms for quick recall- Multiple Linear Regression (MLR) — many predictors
- Model:
- Matrix form:
- OLS estimate:
- Adjusted — penalises extra predictors
- Multicollinearity — VIF
- Categorical encoding, interactions
- Assumptions: LINE + no multicollinearity
Basic Idea
What it is, why it matters, how it worksWhat it is
Multiple Linear Regression (MLR) generalises SLR to multiple predictors:
Each is the expected change in for a 1-unit change in , holding other features constant.
Matrix form
Stack predictors into a design matrix (with a leading column of 1s for the intercept):
OLS solution:
(Same idea as SLR — minimise .)
Goodness of fit
- SSE .
- .
- Adjusted penalises extra predictors: Adds a feature only if it reduces SSE enough to overcome the penalty — useful for model selection.
Categorical predictors
- One-hot encode: categories → dummy variables (drop one as reference).
- Coefficients interpret as effect relative to the reference category.
Interactions
- Add term if effect of one feature depends on the value of another.
Assumptions
Same as SLR plus:
- No multicollinearity — predictors not strongly linearly dependent. Otherwise is near-singular and coefficient variances explode.
- Detect via:
- Pairwise correlations > 0.9 are red flags.
- Variance Inflation Factor , where comes from regressing on the other predictors. or 10 is concerning.
Coefficient interpretation
is the expected change in per unit increase in holding all other predictors constant. The "ceteris paribus" caveat matters — predictors must vary independently in the data for this interpretation to hold.
Regularisation — when is large or features are correlated
- Ridge — adds penalty; shrinks all coefficients.
- Lasso — adds ; performs feature selection.
- Elastic Net — both.
Diagnostic plots
- Residual vs fitted — random scatter ideal.
- Q-Q plot — normality.
- Scale-location — homoscedasticity.
- Residual vs leverage — Cook's distance for influential points.
Worked sketch — yield prediction
.
Interpretation:
- 1 mm extra rainfall → +0.005 t/ha yield.
- 1 °C extra → −0.08 t/ha (heat stress).
- 1 unit fertiliser → +0.3 t/ha.
Why it matters
- Workhorse of statistics and econometrics.
- Strong, interpretable baseline before going non-linear.
- Foundation for logistic, ridge, lasso, GLMs.
Mind Map
Visual structure of the conceptMULTIPLE LINEAR REGRESSION
├── Model: ŷ = β₀ + Σ βⱼ xⱼ
├── Matrix form: ŷ = Xβ
├── OLS: β̂ = (XᵀX)⁻¹ Xᵀy
├── Fit measures
│ ├── R²
│ └── Adjusted R² (penalises p)
├── Categorical → one-hot (K−1 dummies)
├── Interactions: xᵢ · xⱼ
├── Assumptions (LINE + no multicollinearity)
├── Multicollinearity check
│ ├── Pairwise corr > 0.9
│ └── VIF > 5 or 10
└── Regularisation
├── Ridge (L2)
├── Lasso (L1)
└── Elastic Net
Exam Q&A
Part A (2 marks) and Part B (20 marks) style questionsPart A (2 marks each)
Q1. Write the general form of multiple linear regression. . In matrix form .
Q2. How is the OLS estimate of computed in matrix form? .
Q3. What is multicollinearity? How can you detect it? A condition where two or more predictors are strongly linearly related, making near-singular and inflating coefficient variances. Detected via pairwise correlations or the Variance Inflation Factor , with VIF or 10 considered problematic.
Part B (20 marks)
Q. Discuss Multiple Linear Regression. Derive the OLS estimator in matrix form. Explain how to handle multicollinearity, categorical predictors, and how Adjusted differs from . Give an example application.
Model. , or in matrix form
Derivation of OLS. Minimise
Differentiate w.r.t. :
Solve:
(Assumes is invertible — i.e., predictors are linearly independent and .)
Categorical predictors.
One-hot encode a -level categorical into dummy columns (one reference dropped). Each coefficient is the effect relative to the reference. E.g., region with levels {N, S, E, W} and reference N gives dummies ; their coefficients give average -difference vs region N.
Multicollinearity.
-
Effect. near singular → coefficients unstable, large standard errors, wrong signs, p-values unreliable.
-
Detection.
- Pairwise correlation matrix; values suspicious.
- Variance Inflation Factor where is from regressing on the remaining predictors. or is problematic.
- Condition number of .
-
Treatment.
- Drop one of the offending features.
- Combine into a single composite (PCA, summed index).
- Use Ridge regression which adds , stabilising the inverse.
vs Adjusted .
| Metric | Formula | Behaviour |
|---|---|---|
| Never decreases when adding a predictor — even useless ones. | ||
| Adj. | Adds penalty for ; can decrease if new feature doesn't help enough. Better for model comparison. |
Example — agricultural yield prediction.
Predict crop yield (t/ha) from rainfall (mm), temperature (°C), fertiliser (kg).
After fitting on historical seasons:
Interpretation:
- Each extra mm rainfall raises yield by 0.005 t/ha.
- Each extra °C reduces yield by 0.08 t/ha (heat stress).
- Each extra unit fertiliser raises yield by 0.3 t/ha.
Diagnostics. Plot residuals vs fitted (random scatter ✓), Q-Q plot (approximately normal ✓), check VIFs (rainfall and temperature highly correlated → VIF > 10 → either drop one, or use Ridge).
Why MLR matters.
- Interpretable, fast, strong baseline.
- Foundation for logistic regression, GLMs, Ridge / Lasso.
- Used in econometrics, finance, agriculture, marketing-mix modelling.