Escaping the Plateau

You have the architecture and the data, but the network isn't converging. A training loss plateauing at around 0.01 for a coordinate regression task is a classic symptom of Model Collapse—the network has given up and is simply predicting the "average" shape for every input.

The 0.01 Failure

In a 256x256 pixel normalized space, an MSE of 0.01 equals an RMSE of 10% (0.1). This means your network is guessing 25.6 pixels away from the true corner—a total failure.

1. Learning Rate Discipline

If your learning rate is too high (e.g., $0.001$), the network takes steps that overshoot the target, continuously bouncing around the optimal localized valley without ever dropping in.

The Fix: Manually drop the learning rate to 0.0001 or 0.00001 when the plateau begins. optimizer: tf.train.adam(0.0001)

2. The Huber Loss Strategy

Initially, random 3D perspective tilts produce massive errors. MSE squares these errors, creating gradient spikes that destabilize the weights early on.

The Fix: Use Huber Loss (tf.losses.huberLoss). It acts linearly for large outlier errors (preventing spikes) but acts quadratically like MSE as errors approach zero, ensuring sub-pixel precision in the final phase.

3. Multi-Scale Normalization

Neural networks struggle with disparate scales. Always ensure your input pixels are [0, 1] (divide by 255) and your target coordinates are [0, 1] (divide by canvas dimensions).

4. Output Layer Pureness

Coordinate regression must output negative values if a corner is pulled off-canvas. Never use relu or sigmoid on your final Dense layer. It must stay 'linear'.

Plateau Diagnosis (MSE vs Time)

AVERAGE SHAPE COLLAPSE (MSE ~0.01)

Normalization Check

Pixels: 0-255 vs 0-1.0

PASS (Scaled)

Activation Check

Final layer activation function

Linear (True Regression)