← Back to Time Series Forecasting

Lesson 03

ARIMA

Autoregressive Integrated Moving Average models combine three mechanisms: AR uses past values to predict, I handles non-stationarity through differencing, MA corrects using past forecast errors. Together they form the classical linear forecasting framework.

The AR component

An AR(p) model regresses xt on its own p lagged values. The coefficient φ₁ controls persistence: near +1 gives a slowly trending series, near −1 gives rapid mean-reversion, near 0 gives white noise.

AR(p): xt = c + φ₁xt−1 + φ₂xt−2 + ... + φpxt−p + εt For AR(1): xt = φ₁·xt−1 + εt |φ₁| < 1 → stationary, mean-reverting φ₁ = 1 → random walk (non-stationary) φ₁ > 0 → momentum / persistence φ₁ < 0 → oscillates (mean-reversion each step)
0.40
AR(1) simulated series — 120 bars
φ₁ near 0.9 produces a slowly drifting series that looks like a trend but is guaranteed to return to its mean. φ₁ near −0.9 bounces rapidly above and below zero — similar to bid-ask bounce in tick data.

The MA component

An MA(q) model uses the past q forecast errors as predictors. Unlike AR, MA is always stationary regardless of parameters. It captures the effect of transient shocks that persist for exactly q periods.

MA(q): xt = μ + εt + θ₁εt−1 + θ₂εt−2 + ... + θqεt−q The εt are i.i.d. white noise with mean 0, variance σ² θ₁ > 0 → positive shock persists one period θ₁ < 0 → positive shock partially reverses next period
Bid-ask bounce. An MA(1) with θ₁ ≈ −0.5 is a classic model for microstructure noise in high-frequency data: a buy at the ask creates a positive return that partially reverses at the next quote. Detecting this pattern in the ACF (single spike at lag 1) is a textbook diagnostic.

Combining into ARIMA(p, d, q)

The "I" in ARIMA is the number of times the series must be differenced before fitting ARMA. For financial prices d = 1 (log returns) is almost always sufficient.

ARIMA(p, d, q): Step 1 — Difference the series d times (e.g. log returns = d=1) Step 2 — Fit ARMA(p, q) to the differenced series: xt = c + Σ φkxt−k + εt + Σ θkεt−k k=1..p (AR) k=1..q (MA) Model selection rules of thumb: ACF tails off, PACF cuts off at lag p → AR(p) PACF tails off, ACF cuts off at lag q → MA(q) Both tail off → ARMA(p,q) AIC / BIC grid search over (p,q) → practical approach
AIC vs BIC. AIC = −2·log L + 2k penalises complexity lightly, favouring richer models. BIC = −2·log L + k·log(n) penalises complexity more heavily, favouring parsimony. For financial returns with low signal-to-noise ratio, BIC often selects ARIMA(1,0,1) or ARIMA(0,0,1).

Forecasting with confidence bands

ARIMA produces h-step-ahead point forecasts with expanding confidence bands. The further ahead, the wider the uncertainty. For return series (mean ≈ 0), long-horizon forecasts converge to zero — an honest admission of ignorance.

History (blue) → 20-step ARIMA forecast (orange) with 68% and 95% confidence bands
Forecast uncertainty grows with horizon. The 1-step CI is tight; the 20-step CI is wide. For an AR(1) with φ = 0.3, the forecast variance after h steps is σ²·(1 − φ²ʰ)/(1 − φ²). At h = 10, φ = 0.3, this is already 97% of the unconditional variance — the model has almost no predictive content beyond ~5 bars.