When building reproducible AI pipelines, the first instinct is to set temperature=0 and use a fixed seed. This seems logical—eliminate randomness, get consistent results. But in practice, this isn't enough.
This article draws heavily from the foundational research by Horace He and Thinking Machines Lab on defeating nondeterminism in LLM inference [1].
The Hidden Source of Variance
The problem lies in batch processing. As documented in [1], when you process multiple items in a batch, the order of operations within GPU kernels (RMSNorm, MatMul, Attention) can vary between runs. This introduces subtle but measurable differences in output.
Consider a batch of 4 images with seed=42: - Run 1: [img_a, img_b, img_c, img_d] - Run 2: [img_a', img_b', img_c', img_d']
Even with identical inputs, img_a ≠ img_a'. The differences are small—often imperceptible to the human eye—but they compound across iterations and break version control.
The Solution: batch_size=1
Enforcing single-item batches eliminates this variance. Each generation is isolated, with no cross-contamination from parallel processing.
sampler = DeterministicKSampler(
seed=42,
batch_size=1, # Critical
lock_rng_state=True
)Yes, this is slower. But for production pipelines where reproducibility matters more than throughput, it's the only reliable approach.
Beyond batch_size
Even with batch_size=1, multiple RNG sources (PyTorch, NumPy, Python's random) can drift independently. Complete determinism requires capturing and restoring the full RNG state across all sources.
This is what deterministic-nodes provides: a comprehensive solution for truly reproducible AI generation.
References
[1] He, Horace and Thinking Machines Lab. "Defeating Nondeterminism in LLM Inference." Thinking Machines Lab: Connectionism, Sep 2025. https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/