Gradient Descent - Learning Neural Rendering

Calculus inside the Network

Before diving into complex multi-layer WebGPU renderers, it helps to firmly understand what a neural network actually does mathematically when it learns.

Click through the interactive flowchart on the right. You are manually stepping a 1-parameter mathematical engine. Watch how the glowing arrows physically highlight the direction of data flow:

1. Forward Predict: Data flows left-to-right. The Input (x) is mathematically transformed by the network Weight (w) to form a Prediction (y').
2. Compute Loss: The Prediction (y') is compared with the Target (y) to find the Difference. We raise this difference to the Power of 2 to get the MSE Loss.
3. Compute Gradient: The Gradient (dL/dw) is computed. Note how the 2 from the loss "falls down" to become a multiplier for the Difference.
4. Update Weight: The Weight (w) is altered using w = w - (α * dL/dw). It shifts opposite to the Gradient, scaled by the Learning Rate (α).

Why Mean Squared Error (MSE)?

Notice we use MSE (Prediction - Target)² instead of absolute difference. Why?

Heavy Penalties: Squaring heavily penalizes massive errors (an error of 10 becomes 100). This forces the network to relentlessly prioritize fixing its absolute worst predictions first.
Direction Agnostic: A negative mistake squared becomes positive. It prevents positive and negative errors from mathematically canceling each other out.
Smooth Gradients: Most crucially for Calculus, parabolas (squares) create smooth, perfectly continuous curve geometries. They have exactly one perfect valley (global minimum) and no sharp, unpredictable mathematical cliffs, ensuring the gradient can flawlessly slide down the hill to zero.

Key Definitions

Epoch: One complete cycle of training algorithm processing. In this simulation, clicking through all 4 phases exactly completes 1 full Epoch.
Learning Rate (α): The Gradient mathematically tells us which direction to step, but the Learning Rate dictates how big of a step to take. Too small = freezing slow learning. Too big = the weight violently overcorrects and explodes.

Forward Pass: Awaiting Initialization

Input (x) 2.0

* w

→

Weight (w) 0.50

= y'

→

Prediction (y') ?

dL/dw = ?

↑

Combine

→

MSE Loss ?

(y' - y)²

←

Target (y) 10.0

(Cycle): EPOCH 0

Learning Rate (α)

0.01

Loss (MSE)

ⓘ

Click Forward Predict to begin.

3a. The Learning Process (Gradient Descent)

Calculus inside the Network

Why Mean Squared Error (MSE)?

Key Definitions

Forward Pass: Awaiting Initialization