Lesson 01
Financial time series carry temporal structure that cross-sectional data doesn't — autocorrelation, trends, and non-stationarity. Understanding these properties is the first step to building valid forecasting models.
Most machine learning datasets assume that rows are independent and identically distributed (i.i.d.). You can shuffle the training set, and the model doesn't care. A time series violates this assumption completely: row order encodes information. Yesterday's price directly influences today's; yesterday's volatility predicts tomorrow's.
In financial data, this serial dependency shows up everywhere. Prices trend. Volatility clusters. Spreads widen at the open. These patterns are not noise — they are the signal that forecasting models exploit.
Autocorrelation at lag k measures the correlation between a series and its own past: ACF(k) = Corr(xt, xt-k). For raw prices, ACF decays very slowly — prices are highly correlated with themselves because they trend. For returns, ACF is near zero at almost every lag in efficient markets.
Positive autocorrelation (ACF > 0) indicates momentum: a high return tends to be followed by another high return. Negative autocorrelation (ACF < 0) indicates mean reversion: a high return tends to be followed by a low return. Lag-1 negative ACF in high-frequency data often signals bid-ask bounce.
A lag-1 scatter plot places xt on the x-axis and xt+1 on the y-axis. Each point represents a consecutive pair of observations. If the cloud is elongated along the diagonal, the series has positive autocorrelation. A circular cloud means no autocorrelation. An elongated cloud running anti-diagonal means negative autocorrelation.
Raw prices are non-stationary: they drift upward over time, their variance grows, and their distribution changes. Most forecasting models require stationarity. The standard transformation is the log return, which is approximately equal to the percentage return for small moves and has the critical property of being additive across time.