GameNGen: Diffusion Models Are Real-Time Game Engines
Valevski et al. (Google Research)
DOOM at 20fps on TPU. PSNR 29.4. Noise augmentation halts autoregressive drift. ICLR 2025.
Valevski et al. (Google Research)
DOOM at 20fps on TPU. PSNR 29.4. Noise augmentation halts autoregressive drift. ICLR 2025.
Alonso et al. (Geneva/Edinburgh)
Diffusion world model for Atari. Trains in 3 days on RTX 4090. Scales to Counter-Strike at 381M params. NeurIPS 2024 Spotlight. Open source.
Decart/Etched
Real-time Minecraft generation. V2V layer transforms state into aesthetics at 1080p/30fps. $53M from Sequoia.
Google DeepMind
11B-parameter foundation world model. Persistent 3D worlds from single images.
Google DeepMind
24fps, 720p, text-to-interactive-3D. Current SOTA for general world simulation. Multi-minute spatial coherence.
Aki et al.
91fps image-to-image on RTX 4090. Proves 60fps neural rendering is achievable today. ICCV 2025.
Song et al.
Foundational — single-step generation enabling real-time diffusion inference. ICML 2023.
NVIDIA
100ms for 1024x1024. Best quality-per-millisecond for image generation. ICCV 2025.
Dec 2025
100-200x end-to-end speedup, open source on RTX 5090.
ICLR 2025 Oral
Dynamic quality/speed tradeoff per frame — perfect for variable-complexity scenes.
Lightricks
First open-source model generating video faster than playback speed.
CVPR 2025
Distills bidirectional diffusion into causal streaming at 9.4fps.
Tencent
Claims 1080p/60fps interactive via 3D-VAE + KV-cache shift-window denoising.
Tencent
24fps with geometric consistency, open source, full pipeline.
Chen et al.
Bridges autoregressive and diffusion approaches. Variable-horizon coherence without catastrophic drift. NeurIPS 2024.
March 2026
Persistent 3D state in latent space, camera as query, geometric consistency by construction. THE most important paper for state coherence.
NeurIPS 2025
+14dB PSNR over DIAMOND, 50-step coherence vs 4-step via dual SSM+diffusion architecture.
NeurIPS 2025
Replaces DIAMOND's LSTM with Mamba SSM for immediate coherence gains.
NeurIPS 2025 Spotlight
Solves autoregressive drift by training on self-generated outputs. Directly attacks state drift.
March 2026
External memory for diffusion game engines. Decomposes into Memory + Observation + Dynamics modules.
ICCV 2025
Block-wise SSM extends temporal memory beyond attention windows.
Hafner et al. (Nature 2025)
General world model maintaining state across 150+ diverse tasks with single config.
MBZUAI (Nov 2025)
Natural language control of world actions. Highest fidelity among open-source models.
ICML 2025
Bridges LLMs (intent/narrative) with world models (physics/dynamics). THE architecture paper for intent understanding.
Google (Dec 2025)
Declarative intent → rendered output protocol. Shows how to structure intent-to-experience pipelines.
DeepMind (ICLR 2024)
Simulates both high-level instructions and low-level controls. Multi-modal intent understanding.
ICLR 2025
InstructNet for interactive control of generated game content from natural language instructions.
ICLR 2026 Oral
First benchmark platform for closed-loop world model evaluation. Key finding: controllability > visual quality.
Microsoft (ICCV 2025 Workshop)
LogicNet for numerical consistency, spatial memory. Directly attacks score-tracking and physics verification.
Sep 2025
60.6% improvement in physics consistency. Soft mask training + warm start inference.
Jul 2025
Systematically catalogs failure modes — shallow coherence, error explosion, generality limits.
May 2025
Converts pretrained video diffusion into interactive world models. THE bridge between passive and interactive systems.
NVIDIA (Jan 2025)
Foundation model platform with fine-tuning pipeline. Shows how to build boundary interfaces for domain-specific deployment.
Wayve (March 2025)
Domain-specific world model for autonomous driving. Demonstrates rigid-system integration with sensors, maps, controls.
May 2024
Challenges visual quality = world understanding. Critical framing for trust calibration.
ACM CSUR 2025
Comprehensive survey of world model approaches. Taxonomy and comparison.
DeepMind (ICML 2024)
Foundation work. 11B-param model generating 2D platformer environments from images.
ICLR 2023
Early discrete-token world model. Pioneered autoregressive game simulation.
Meta (Jun 2025)
Video prediction via joint-embedding. Self-supervised world model approach.
Microsoft (Apr 2025)
Minecraft world generation with structured reasoning.
ICCV 2025 Highlight
Modular game generation pipeline from descriptions.
More papers being reviewed and added continuously.