Paper Library

Key research papers and notes. Organized by relevance and research pillar.

2024highReal-Time

GameNGen: Diffusion Models Are Real-Time Game Engines

Valevski et al. (Google Research)

DOOM at 20fps on TPU. PSNR 29.4. Noise augmentation halts autoregressive drift. ICLR 2025.

2024highReal-TimeState

DIAMOND: Diffusion for World Modeling

Alonso et al. (Geneva/Edinburgh)

Diffusion world model for Atari. Trains in 3 days on RTX 4090. Scales to Counter-Strike at 381M params. NeurIPS 2024 Spotlight. Open source.

2024highReal-Time

Oasis: A Universe in a Transformer

Decart/Etched

Real-time Minecraft generation. V2V layer transforms state into aesthetics at 1080p/30fps. $53M from Sequoia.

2024highReal-TimeState

Genie 2: A Large-Scale Foundation World Model

Google DeepMind

11B-parameter foundation world model. Persistent 3D worlds from single images.

2025highReal-TimeIntent

Genie 3: A New Frontier for World Models

Google DeepMind

24fps, 720p, text-to-interactive-3D. Current SOTA for general world simulation. Multi-minute spatial coherence.

2025highReal-Time

StreamDiffusion: Real-Time Interactive Generation

Aki et al.

91fps image-to-image on RTX 4090. Proves 60fps neural rendering is achievable today. ICCV 2025.

2023highReal-Time

Consistency Models

Song et al.

Foundational — single-step generation enabling real-time diffusion inference. ICML 2023.

2025highReal-Time

SANA-Sprint

NVIDIA

100ms for 1024x1024. Best quality-per-millisecond for image generation. ICCV 2025.

2025mediumReal-Time

TurboDiffusion

Dec 2025

100-200x end-to-end speedup, open source on RTX 5090.

2025highReal-Time

Shortcut Models

ICLR 2025 Oral

Dynamic quality/speed tradeoff per frame — perfect for variable-complexity scenes.

2025mediumReal-Time

LTX-Video

Lightricks

First open-source model generating video faster than playback speed.

2025mediumReal-Time

CausVid

CVPR 2025

Distills bidirectional diffusion into causal streaming at 9.4fps.

2025highReal-Time

Yan — Interactive World Generation

Tencent

Claims 1080p/60fps interactive via 3D-VAE + KV-cache shift-window denoising.

2025mediumReal-Time

HY-World 1.5

Tencent

24fps with geometric consistency, open source, full pipeline.

2024highStateReal-Time

Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

Chen et al.

Bridges autoregressive and diffusion approaches. Variable-horizon coherence without catastrophic drift. NeurIPS 2024.

2026highState

PERSIST: Persistent 3D World Generation

March 2026

Persistent 3D state in latent space, camera as query, geometric consistency by construction. THE most important paper for state coherence.

2025highState

StateSpaceDiffuser

NeurIPS 2025

+14dB PSNR over DIAMOND, 50-step coherence vs 4-step via dual SSM+diffusion architecture.

2025highState

EDELINE: SSM World Models

NeurIPS 2025

Replaces DIAMOND's LSTM with Mamba SSM for immediate coherence gains.

2025highState

Self Forcing

NeurIPS 2025 Spotlight

Solves autoregressive drift by training on self-generated outputs. Directly attacks state drift.

2026highState

MultiGen: Memory for Diffusion Game Engines

March 2026

External memory for diffusion game engines. Decomposes into Memory + Observation + Dynamics modules.

2025mediumState

Long-Context SSM Video World Models

ICCV 2025

Block-wise SSM extends temporal memory beyond attention windows.

2025highState

DreamerV3

Hafner et al. (Nature 2025)

General world model maintaining state across 150+ diverse tasks with single config.

2025highIntent

PAN: Language-Conditioned World Actions

MBZUAI (Nov 2025)

Natural language control of world actions. Highest fidelity among open-source models.

2025highIntent

FOUNDER: Bridging LLMs with World Models

ICML 2025

Bridges LLMs (intent/narrative) with world models (physics/dynamics). THE architecture paper for intent understanding.

2025highIntentBoundary

A2UI Protocol

Google (Dec 2025)

Declarative intent → rendered output protocol. Shows how to structure intent-to-experience pipelines.

2024highIntent

UniSim: Universal Simulation

DeepMind (ICLR 2024)

Simulates both high-level instructions and low-level controls. Multi-modal intent understanding.

2025mediumIntent

GameGen-X: Interactive Game Generation

ICLR 2025

InstructNet for interactive control of generated game content from natural language instructions.

2026highVerificationTrust

World-in-World: Closed-Loop WM Evaluation

ICLR 2026 Oral

First benchmark platform for closed-loop world model evaluation. Key finding: controllability > visual quality.

2025highVerification

MaaG: Model as a Game

Microsoft (ICCV 2025 Workshop)

LogicNet for numerical consistency, spatial memory. Directly attacks score-tracking and physics verification.

2025highVerification

PIWM: Physics-Informed World Model

Sep 2025

60.6% improvement in physics consistency. Soft mask training + warm start inference.

2025highVerificationTrust

Critiques of World Models as Planning

Jul 2025

Systematically catalogs failure modes — shallow coherence, error explosion, generality limits.

2025highBoundary

Vid2World: Interactive Worlds from Video Diffusion

May 2025

Converts pretrained video diffusion into interactive world models. THE bridge between passive and interactive systems.

2025highBoundary

NVIDIA Cosmos: World Foundation Models

NVIDIA (Jan 2025)

Foundation model platform with fine-tuning pipeline. Shows how to build boundary interfaces for domain-specific deployment.

2025highBoundary

Wayve GAIA-2: Driving World Model

Wayve (March 2025)

Domain-specific world model for autonomous driving. Demonstrates rigid-system integration with sensors, maps, controls.

2024highTrust

Is Sora a World Simulator?

May 2024

Challenges visual quality = world understanding. Critical framing for trust calibration.

2025mediumReal-TimeState

Understanding World or Predicting Future?

ACM CSUR 2025

Comprehensive survey of world model approaches. Taxonomy and comparison.

2024mediumReal-Time

Genie 1: Generative Interactive Environments

DeepMind (ICML 2024)

Foundation work. 11B-param model generating 2D platformer environments from images.

2023mediumReal-TimeState

IRIS: Imagination with Auto-Regression over an Inner Speech

ICLR 2023

Early discrete-token world model. Pioneered autoregressive game simulation.

2025mediumReal-Time

V-JEPA 2

Meta (Jun 2025)

Video prediction via joint-embedding. Self-supervised world model approach.

2025mediumReal-Time

MineWorld

Microsoft (Apr 2025)

Minecraft world generation with structured reasoning.

2025mediumReal-Time

GameFactory

ICCV 2025 Highlight

Modular game generation pipeline from descriptions.

More papers being reviewed and added continuously.