Regression and Diagnostics
OLS in the Finance Context¶
Ordinary Least Squares (OLS) regression is the workhorse of empirical finance. Factor models, event studies, and predictive regressions are all built on it. The canonical setup:
where is an asset’s excess return and are factor returns (e.g., Fama-French factors).
Gauss-Markov Assumptions¶
OLS is BLUE (Best Linear Unbiased Estimator) when:
Linearity: The model is linear in parameters
No perfect multicollinearity: Regressors are not perfectly linearly dependent
Zero conditional mean:
Homoskedasticity: (constant)
No autocorrelation: for
Financial data routinely violates assumptions 4 and 5. Understanding when this matters (inference) vs. when it doesn’t (point estimates) is essential.
The Fama-French Three-Factor Model¶
The three-factor model (Fama & French, 1993) extends the CAPM with size (SMB) and value (HML) factors:
A non-zero (Jensen’s alpha) indicates abnormal risk-adjusted return — the central object of interest in performance evaluation.
Key Diagnostic Tests¶
Heteroskedasticity¶
When is not constant, OLS standard errors are biased. The solution is heteroskedasticity-robust standard errors (White, 1980), which are now standard practice.
White test: Regress on all regressors, their squares, and cross-products. The test statistic .
Breusch-Pagan test: A simpler version assuming the variance is a linear function of the regressors.
Autocorrelation¶
Time series residuals are often autocorrelated. The Ljung-Box test checks for autocorrelation up to lag :
For financial return regressions, the Newey-West correction adjusts standard errors for both heteroskedasticity and autocorrelation (HAC standard errors).
Python Implementation¶
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.stats.diagnostic import het_breuschpagan, acorr_ljungbox
np.random.seed(1)
n = 500
# Simulate a simple factor model
mkt = np.random.normal(0.005, 0.02, n)
smb = np.random.normal(0.002, 0.01, n)
eps = np.random.normal(0, 0.01, n)
ret = 0.001 + 0.8 * mkt + 0.3 * smb + eps # true alpha = 0.001
X = sm.add_constant(np.column_stack([mkt, smb]))
model = sm.OLS(ret, X).fit()
# Standard errors comparison
print("=== Standard OLS SEs ===")
print(model.summary().tables[1])
model_hc = model.get_robustcov_results(cov_type='HC3')
print("\n=== Heteroskedasticity-Robust SEs (HC3) ===")
print(model_hc.summary().tables[1])
# Newey-West HAC standard errors
model_hac = model.get_robustcov_results(cov_type='HAC', maxlags=5)
print("\n=== Newey-West HAC SEs (5 lags) ===")
print(model_hac.summary().tables[1])
# Diagnostic tests
bp_stat, bp_pval, _, _ = het_breuschpagan(model.resid, X)
lb_result = acorr_ljungbox(model.resid, lags=[10], return_df=True)
print(f"\nBreusch-Pagan test: stat={bp_stat:.3f}, p={bp_pval:.4f}")
print(f"Ljung-Box test (10 lags): stat={lb_result['lb_stat'].values[0]:.3f}, "
f"p={lb_result['lb_pvalue'].values[0]:.4f}")When to Use Which Standard Errors¶
| Situation | Recommended SE |
|---|---|
| i.i.d. errors (rare in finance) | OLS standard |
| Heteroskedasticity only | HC3 robust |
| Autocorrelation + heteroskedasticity | Newey-West HAC |
| Panel data, clustering | Clustered SE |
References¶
White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48(4), 817–838.
Newey, W. K., & West, K. D. (1987). A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55(3), 703–708.
Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3–56.