Our 21x21 grid requires sub-pixel precision. If our estimate is off by just 0.5 pixels at the corners, the resulting coordinate system drift makes accurate color sampling impossible.
While classic CV "snaps" to the nearest integer pixel, a Neural Network can output continuous floating-point values. By training on perfectly labeled synthetic data, the network learns to map internal feature intensities directly to sub-pixel coordinates.
State-of-the-art models for corner detection (like LSCCL) don't just output coordinates. They predict two components simultaneously:
AdaptiveThreshold, the network is never "distracted" by the aliased edges of the colored cells. It learns the smooth, continuous geometry of the perspective transformation.