ReproducibilitySTABLE

Deterministic Nodes

View Source
ComfyUIPythonRNG

Assistant

Ask about Deterministic Nodes...

The dirty secret of "deterministic" AI generation: it isn't. Set temperature to 0, fix your seed, run the same prompt twice, and you'll often get different results. The culprit is batch-size variance—the way modern GPU kernels parallelize computation introduces floating-point accumulation differences when batch sizes change.

This insight comes from ThinkingMachines' research on defeating nondeterminism in LLM inference (see [thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference](https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference)). Their key finding: the commonly cited "concurrency + floating-point" explanation is incomplete. The actual cause is batch-size variance in RMSNorm, MatMul, and Attention kernels. Their solution—batch-invariant kernels—ensures identical numerics regardless of batch size.

Deterministic Nodes implements this principle for diffusion models. We enforce batch_size=1 and fix all RNG states at every sampling step. It's slower—you can't batch multiple images—but the output is provably identical across runs, across machines, across CUDA versions.

The technical approach: we intercept the sampler's RNG calls and replace them with a deterministic sequence seeded from the user's seed. We also force synchronous CUDA execution to prevent kernel scheduling variance. The result is byte-identical outputs guaranteed.

Features

  • Byte-identical outputs across runs
  • Fixed RNG state at every sampling step
  • Enforced batch_size=1 for accumulation consistency
  • Synchronous CUDA execution mode
  • Cross-machine reproducibility verified
  • Seed logging for reconstruction

Technical Details

  • Intercepts torch.Generator at sampling level
  • Disables cudnn.benchmark for determinism
  • Sets CUBLAS_WORKSPACE_CONFIG for matmul reproducibility
  • ~15% slower than non-deterministic mode

Related Processes

CONDUIT