← Back to Modules

3a. The Learning Process (Gradient Descent)

Calculus inside the Network

Before diving into complex multi-layer WebGPU renderers, it helps to firmly understand what a neural network actually does mathematically when it learns.

Click through the interactive flowchart on the right. You are manually stepping a 1-parameter mathematical engine. Watch how the glowing arrows physically highlight the direction of data flow:

Why Mean Squared Error (MSE)?

Notice we use MSE (Prediction - Target)² instead of absolute difference. Why?

  • Heavy Penalties: Squaring heavily penalizes massive errors (an error of 10 becomes 100). This forces the network to relentlessly prioritize fixing its absolute worst predictions first.
  • Direction Agnostic: A negative mistake squared becomes positive. It prevents positive and negative errors from mathematically canceling each other out.
  • Smooth Gradients: Most crucially for Calculus, parabolas (squares) create smooth, perfectly continuous curve geometries. They have exactly one perfect valley (global minimum) and no sharp, unpredictable mathematical cliffs, ensuring the gradient can flawlessly slide down the hill to zero.

Key Definitions

  • Epoch: One complete cycle of training algorithm processing. In this simulation, clicking through all 4 phases exactly completes 1 full Epoch.
  • Learning Rate (α): The Gradient mathematically tells us which direction to step, but the Learning Rate dictates how big of a step to take. Too small = freezing slow learning. Too big = the weight violently overcorrects and explodes.

Forward Pass: Awaiting Initialization

Input (x) 2.0
* w
Weight (w) 0.50
= y'
Prediction (y') ?
dL/dw = ?
Combine
MSE Loss ?
(y' - y)²
Target (y) 10.0
(Cycle): EPOCH 0
Learning Rate (α)
0.01
Adjust Rate
Loss (MSE)
MSE Loss History
As the model learns, this error curve should trend downwards. A lower score means higher performance.
Variable
COLOR