In central vision, we need only 4 points for a homography. In non-central vision (multi-camera rigs), estimating the homography from a planar scene requires a more complex solver.
The movement of a generalized camera over a plane can be solved using 16 point correspondences. This "minimal" set allows us to recover the relative 6-DOF pose ($R, \mathbf{t}$) and the plane parameters ($\mathbf{n}, d$).
This is the Generalized Epipolar Constraint (GEC) specialized for planar homography.
By defining 16 intermediate variables, the problem can be linearized into a matrix $A$ of size $N \times 16$. Just like DLT, we solve this via SVD. This is the heart of the 16-Point RANSAC framework.