Lesson 05

Viterbi & Training

Given a trained HMM and a sequence of observations, the Viterbi algorithm finds the single most likely hidden state sequence. This is how we decode which regime was active at each bar.

The two HMM problems we solve

Problem 1: Training (Baum-Welch). Given unlabelled market data, find the parameters λ = (π, A, B) that maximise the likelihood of the data. Uses expectation-maximisation (EM): alternates between computing soft state assignments (E-step) and updating parameters (M-step) until convergence.

Problem 2: Decoding (Viterbi). Given a trained model and observed returns, find the most likely hidden state sequence S* = argmax P(S | observations, λ). This is what runs on every new bar in production.

Viterbi algorithm — step by step

A toy 3-state HMM with 20 observations. The algorithm fills a trellis — a grid where each cell holds the probability of the best path to state k at time t. Watch it decode forward, then trace back the optimal path.

—

Observations (returns) — 20 time steps

Viterbi trellis — state probability at each time step

Decoded state sequence (most likely path)

Reading the trellis. Each column is a time step. Each row is a state (Bull, Chop, Bear). Brightness = log-probability of the best path arriving at that state. The bright cells form the winning path. At the end, we trace back from the most likely final state to reconstruct the full sequence.

How training works: Baum-Welch

The model is initialised with random parameters, then iteratively refined. Each iteration is guaranteed not to decrease the likelihood of the training data — the algorithm converges to a local maximum.

E-step: γ_t(k) = P(S_t=k | observations, λ) ← forward-backward
M-step: μ_k = Σ γ_t(k)·x_t / Σ γ_t(k) ← update means
A_ij = Σ ξ_t(i,j) / Σ γ_t(i) ← update transitions
Repeat until convergence (change in log-likelihood < tol)

hmmlearn handles this automatically. model.fit(features) runs Baum-Welch internally for up to n_iter=200 iterations. The result: a trained model with learned μ_k, Σ_k, and A_ij for all 7 states. In our system, training takes ~10–30 seconds on 17,000 bars.

Local optima. Baum-Welch finds a local, not global, maximum. Different random seeds produce different regime labellings (which state gets called "Bull Run" may vary). The auto-labelling step — identifying the high-return state post-training — handles this robustly by ranking states by their mean return rather than relying on a fixed state index.

Current probability: the forward algorithm

In production, we run the forward algorithm on each new bar to get the current probability distribution over all 7 states. This is shown as the "confidence score" in the dashboard.

α_t(k) = P(x₁, ..., x_t, S_t=k | λ)
α₁(k) = π_k · B_k(x₁) initialise
α_t(k) = B_k(x_t) · Σ_j α_t-1(j) · A_jk recurse

Normalised probabilities. The dashboard shows P(regime | data) — the forward probability for each state normalised to sum to 1. A value of 0.87 for "Bull Run" means: given everything we've observed up to this bar, there's an 87% probability we're in a bull regime. This is your confidence score.

← Lesson 04: HMM Foundations Next: Feature Engineering →