The Paradigm Shift

For decades, computer vision has relied on Handcrafted Descriptors (SIFT, SURF, ORB). These algorithms look for specific mathematical patterns like "corners" or "blobs" to establish pixel correspondences.

Why Classic Features Fail

Heuristic-based detectors are binary thinkers. They either see a corner or they don't. When noise, low-light, or extreme motion blur hides these features, the geometry collapses. There is no middle ground.

Classic CV (Heuristic)

Threshold → Contour → Moment Centroid → Integer Snap.
Result: Fragile at high tilts & noise.

Deep Homography (Neural)

Raw Pixel Stream → CNN Layers → Direct Regression Head.
Result: Robust; learns global geometric context.

Convolutional Regression

Instead of manually creating rules for what a "corner" looks like, we use a Convolutional Neural Network (CNN). A CNN learns a hierarchy of features—from simple edges to complex shapes like the nested squares in our 21x21 grid.

The network doesn't output a "yes/no" detection. It performs Regression, outputting 8 continuous floating-point numbers that represent the degrees of freedom in our Homography matrix.

"A neural network perceives the 'gestalt' of the grid. Even if half the finder pattern is blurred, the network can still infer the exact sub-pixel center by looking at the remaining visible pixels."

Pixels

CNN

Dense

Matrix $H$

Feature Extraction Flow