Real-Time Experience Generation
Achieving 60fps interactive generation through tiered, speculative, and hierarchical synthesis. This is the hardest engineering challenge — generating rich interactive experiences at the speed of thought.
Current Frontier
Hierarchical synthesis: decomposing generation into layout, structure, and detail tiers that can be computed in parallel.
Key Questions
What's the minimum latency budget for intent-to-frame generation to feel responsive?
Can speculative generation (predicting likely next states) achieve perceived real-time even with slower actual generation?
Is diffusion or autoregressive generation better suited for real-time interactive output?
Key Papers
GameNGen: Diffusion Models Are Real-Time Game Engines
Google Research
Proved real-time interactive generation is possible at 20+ fps with a single model
Genie 2: A Large-Scale Foundation World Model
DeepMind
Demonstrated persistent 3D world generation from minimal input
DIAMOND: Diffusion for World Modeling
Microsoft Research
Showed diffusion models can maintain long-horizon coherence
Oasis: A Universe in a Transformer
Decart
Transformer-based world generation at interactive speeds
Genie 3: A New Frontier for World Models
Google DeepMind (Aug 2025)
24fps, 720p, text-to-interactive-3D. Current SOTA for general world simulation.
StreamDiffusion: Real-Time Interactive Generation
Aki et al. (ICCV 2025)
91fps image-to-image on RTX 4090. Proves 60fps neural rendering is achievable today.
Consistency Models
Song et al. (ICML 2023)
Foundational — single-step generation enabling real-time diffusion inference.
SANA-Sprint
NVIDIA (ICCV 2025)
100ms for 1024x1024. Best quality-per-millisecond for image generation.
Shortcut Models
ICLR 2025 Oral
Dynamic quality/speed tradeoff per frame — perfect for variable-complexity scenes.
LTX-Video
Lightricks (2025)
First open-source model generating video faster than playback speed.
Yan — Interactive World Generation
Tencent (Aug 2025)
1080p/60fps interactive via 3D-VAE + KV-cache shift-window denoising.
HY-World 1.5
Tencent (Dec 2025)
24fps with geometric consistency, open source, full pipeline.
Current Insights
GameNGen achieved 20fps with a SINGLE diffusion model. With hierarchical decomposition, 60fps should be achievable.
The key insight from Genie 2: you don't need to generate every pixel from scratch every frame. Persistent elements can be cached and composited.
Speculative generation — pre-computing likely next states — could mask latency entirely.