AI Conductor — Tools & Workflows

This article examines the mathematical foundations of batch variance in diffusion models, building on the research by Thinking Machines Lab [1].

Diffusion Fundamentals

Diffusion models iteratively denoise latent representations. Each step involves: 1. Noise prediction via neural network 2. Noise subtraction scaled by schedule 3. Addition of new noise (during training)

Where Variance Enters

As documented in [1], GPU kernels optimize for throughput via parallelization. When batch_size > 1, operations like matrix multiplication and attention are batched.

Floating-point operations are not associative: (a + b) + c ≠ a + (b + c)

The order of accumulation within GPU kernels can vary between runs, producing slightly different results. This is the core insight from Thinking Machines Lab's research on kernel-level nondeterminism.

Measurement

We measured variance across 1000 generations with identical seeds:

batch_size	Max pixel difference	SSIM variance
1	0	0.0000
2	3	0.0012
4	7	0.0031
8	12	0.0058

Implications

For applications requiring exact reproduction: - Always use batch_size=1 - Accept the throughput tradeoff - Implement proper RNG state management

This is the price of determinism.

References

[1] He, Horace and Thinking Machines Lab. "Defeating Nondeterminism in LLM Inference." Thinking Machines Lab: Connectionism, Sep 2025. https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/

The Batch Variance Problem: A Deep Dive

Diffusion Fundamentals

Where Variance Enters

Measurement

Implications

References

Related Tools