Autoregressive Integrated Moving Average models combine three mechanisms: AR uses past values to predict, I handles non-stationarity through differencing, MA corrects using past forecast errors. Together they form the classical linear forecasting framework.
The AR component
An AR(p) model regresses xt on its own p lagged values. The coefficient φ₁ controls persistence: near +1 gives a slowly trending series, near −1 gives rapid mean-reversion, near 0 gives white noise.
φ₁ near 0.9 produces a slowly drifting series that looks like a trend but is guaranteed to return to its mean. φ₁ near −0.9 bounces rapidly above and below zero — similar to bid-ask bounce in tick data.
The MA component
An MA(q) model uses the past q forecast errors as predictors. Unlike AR, MA is always stationary regardless of parameters. It captures the effect of transient shocks that persist for exactly q periods.
MA(q): xt = μ + εt + θ₁εt−1 + θ₂εt−2 + ... + θqεt−q
The εt are i.i.d. white noise with mean 0, variance σ²
θ₁ > 0 → positive shock persists one period
θ₁ < 0 → positive shock partially reverses next period
Bid-ask bounce. An MA(1) with θ₁ ≈ −0.5 is a classic model for microstructure noise in high-frequency data: a buy at the ask creates a positive return that partially reverses at the next quote. Detecting this pattern in the ACF (single spike at lag 1) is a textbook diagnostic.
Combining into ARIMA(p, d, q)
The "I" in ARIMA is the number of times the series must be differenced before fitting ARMA. For financial prices d = 1 (log returns) is almost always sufficient.
ARIMA(p, d, q):
Step 1 — Difference the series d times (e.g. log returns = d=1)
Step 2 — Fit ARMA(p, q) to the differenced series:
xt = c + Σ φkxt−k + εt + Σ θkεt−kk=1..p (AR)k=1..q (MA)
Model selection rules of thumb:
ACF tails off, PACF cuts off at lag p → AR(p)
PACF tails off, ACF cuts off at lag q → MA(q)
Both tail off → ARMA(p,q)
AIC / BIC grid search over (p,q) → practical approach
AIC vs BIC. AIC = −2·log L + 2k penalises complexity lightly, favouring richer models. BIC = −2·log L + k·log(n) penalises complexity more heavily, favouring parsimony. For financial returns with low signal-to-noise ratio, BIC often selects ARIMA(1,0,1) or ARIMA(0,0,1).
Forecasting with confidence bands
ARIMA produces h-step-ahead point forecasts with expanding confidence bands. The further ahead, the wider the uncertainty. For return series (mean ≈ 0), long-horizon forecasts converge to zero — an honest admission of ignorance.
History (blue) → 20-step ARIMA forecast (orange) with 68% and 95% confidence bands
Forecast uncertainty grows with horizon. The 1-step CI is tight; the 20-step CI is wide. For an AR(1) with φ = 0.3, the forecast variance after h steps is σ²·(1 − φ²ʰ)/(1 − φ²). At h = 10, φ = 0.3, this is already 97% of the unconditional variance — the model has almost no predictive content beyond ~5 bars.