← Back
1. Four Stages, One Pipeline

splatBlobs() is the main rendering function called once per pixel. It proceeds in four stages, each transforming a typed "list token" that enforces the correct calling order at compile time:

[Differentiable]
float4 splatBlobs(uint2 dispatchThreadID, int2 dispatchSize)
{
    // 1. Reset shared counters
    InitializedShortList sList = initShortList(dispatchThreadID);

    // 2. Gather Gaussians that overlap this tile
    FilledShortList filledSList = coarseRasterize(sList, tileBounds, localIdx);

    // 3. Pad to power-of-2 for bitonic sort
    PaddedShortList paddedSList = padBuffer(filledSList, localIdx);

    // 4. Sort by depth (bitonic sort)
    SortedShortList sortedList = bitonicSort(paddedSList, localIdx);

    // 5. Blend sorted Gaussians into final pixel colour
    float4 color = fineRasterize(sortedList, localIdx, uv);

    return float4(color.rgb * (1.0 - color.a) + color.a, 1.0);
}

The four pipeline stages with their phantom type tokens. Arrow colours show data flow; the amber boundary marks where differentiability begins.

Phantom types enforce stage order. InitializedShortList, FilledShortList, PaddedShortList, and SortedShortList are each empty structs ({ 0 }). They carry no data — shared memory holds the actual short-list. Their only purpose is to make it a compile-time error to call bitonicSort before padBuffer, or fineRasterize before sorting. The type checker becomes a pipeline sequencer.

2. What Is Differentiable — and What Isn't

splatBlobs() is marked [Differentiable], but only the last stage — fineRasterize() — actually participates in gradient computation. The first three stages are pure optimisations: they select and reorder Gaussians but produce no output value that feeds back to the loss.

The auto-diff engine in Slang is smart enough to not attempt differentiation through stages that have no path to the output — but the programmer declares the intent explicitly with no_diff annotations on parameters that should not participate, and with the placement of [Differentiable] on only the functions that matter.

Differentiability boundary. Stages left of the amber line handle integer indices and are not differentiated. fineRasterize works with float Gaussian parameters and is where gradients flow.

3. fineRasterize() — the Forward Blend

Once the short-list is sorted back-to-front, fineRasterize() accumulates colour contributions from each Gaussian in order using multiplicative alpha blending. The pixel state tracks accumulated colour and remaining transmittance T:

[Differentiable]
float4 fineRasterize(SortedShortList, uint localIdx, no_diff float2 uv)
{
    GroupMemoryBarrierWithGroupSync();

    PixelState pixelState = PixelState(float4(0, 0, 0, 1), 0);
    uint count = blobCount;

    for (uint i = 0; i < count; i++)
        pixelState = transformPixelState(pixelState, eval(blobs[i], uv, localIdx));

    maxCount[localIdx] = pixelState.finalCount;
    finalVal[localIdx] = pixelState.value;
    return pixelState.value;
}

The pixel state starts as (rgb=0, a=1) — no accumulated colour, full transmittance. transformPixelState applies each Gaussian's contribution: it adds α·T·colour to the accumulated RGB and multiplies transmittance by (1−α). After all Gaussians, the state's .a field is the final remaining transmittance.

After the loop, maxCount and finalVal are written to shared memory. The backward pass will need these to reconstruct the final state without storing every intermediate state. See the Differentiable Rendering page for why.

The uv parameter is marked no_diff: the pixel UV coordinate is a fixed input (it depends only on which pixel we're computing, not on any learnable parameter). Marking it no_diff tells the auto-diff engine not to generate gradient paths through it, which reduces the complexity of the generated backward pass.

Step: 0 /
Transmittance T: 1.000
Accumulated α: 0.000

Step through the fineRasterize loop. Each Gaussian contributes α·T·colour to the accumulation. The swatch shows the current blended pixel colour.

Why store finalVal and maxCount? The backward pass needs to reverse the blending loop. Instead of storing the pixel state at every step (N copies in memory), it stores only the final state and reconstructs earlier states by undoing each step in reverse. This is the state-undo trick — see Differentiable Rendering.

4. Background Compositing

The last line of splatBlobs() composites the accumulated colour against a white background:

return float4(color.rgb * (1.0 - color.a) + color.a, 1.0);

Here color.a is the final remaining transmittance T — the fraction of light that passed through all Gaussians without being absorbed. The formula mixes the accumulated colour with white (float3(1,1,1) = the color.a term) weighted by transmittance.

If all Gaussians are fully opaque, T→0 and the result is entirely the accumulated colour. If all Gaussians are transparent, T→1 and the result is pure white. Partial transparency blends the two — a standard over-white compositing operation used so the learning signal doesn't have to account for an arbitrary background colour.