Once our TensorFlow.js model is trained, we have a system that can take a raw video frame and output 8 precise floating-point numbers. To make this data useful, we must bridge the gap into OpenCV.js for final geometric reconstruction.
We take the 4 corner (x,y) pairs predicted by our network and pass them into cv.getPerspectiveTransform alongside our "Ideal" square coordinates. This generates the 3x3 Homography matrix $H$.
Traditional image warping (cv.warpPerspective) pulls pixels from the source into a destination grid. This often causes Bilinear Blur, where colors are averaged across adjacent cells—killing our colorimetric precision.
Instead, we use Inverse Mapping. We invert $H$ to get $H^{-1}$. For every cell in our 21x21 ideal grid, we mathematically map its center back to the exact sub-pixel location in the *original* raw frame.
Direct Sub-pixel Coordinate Mapping