Numerical Normalization

In high-resolution vision systems, image coordinates often range from 0 to 8000 pixels. When these values are plugged into a DLT matrix, the entries can vary by several orders of magnitude (e.g., $10^0$ vs $10^6$ for squared components).

Matrix Conditioning

This disparity causes the matrix $A^T A$ to have a very high condition number, making the SVD solution extremely sensitive to even tiny amounts of noise. The result is "jittery" or wildly incorrect homographies.

Hartley Normalization

The standard solution is to normalize the points before estimation. This is achieved by a transformation matrix $T$ that shifts the centroid to the origin and scales the points so their average distance is $\sqrt{2}$.

T = \begin{bmatrix} s & 0 & -s \cdot c_x \\ 0 & s & -s \cdot c_y \\ 0 & 0 & 1 \end{bmatrix}

After finding the normalized homography $\tilde{H}$, it must be de-normalized: $H = T'^{-1} \tilde{H} T$.

Performance Impact: Normalization typically improves the Root Mean Square Error (RMSE) by an order of magnitude and ensures stable results even at extreme camera angles.

Hartley Normalization

Apply coordinate scaling and centroid shift.

Noise Level 2.0px

Condition Number

1.2e6

RMSE Error

5.2 px