Back to Lab
Research Pillar

Real-Time Experience Generation

Achieving 60fps interactive generation through tiered, speculative, and hierarchical synthesis. This is the hardest engineering challenge — generating rich interactive experiences at the speed of thought.

Current Frontier

Hierarchical synthesis: decomposing generation into layout, structure, and detail tiers that can be computed in parallel.

Key Questions

01

What's the minimum latency budget for intent-to-frame generation to feel responsive?

02

Can speculative generation (predicting likely next states) achieve perceived real-time even with slower actual generation?

03

Is diffusion or autoregressive generation better suited for real-time interactive output?

Key Papers

GameNGen: Diffusion Models Are Real-Time Game Engines

Google Research

Proved real-time interactive generation is possible at 20+ fps with a single model

Read paper

Genie 2: A Large-Scale Foundation World Model

DeepMind

Demonstrated persistent 3D world generation from minimal input

Read paper

DIAMOND: Diffusion for World Modeling

Microsoft Research

Showed diffusion models can maintain long-horizon coherence

Read paper

Oasis: A Universe in a Transformer

Decart

Transformer-based world generation at interactive speeds

Read paper

Genie 3: A New Frontier for World Models

Google DeepMind (Aug 2025)

24fps, 720p, text-to-interactive-3D. Current SOTA for general world simulation.

Read paper

StreamDiffusion: Real-Time Interactive Generation

Aki et al. (ICCV 2025)

91fps image-to-image on RTX 4090. Proves 60fps neural rendering is achievable today.

Read paper

Consistency Models

Song et al. (ICML 2023)

Foundational — single-step generation enabling real-time diffusion inference.

Read paper

SANA-Sprint

NVIDIA (ICCV 2025)

100ms for 1024x1024. Best quality-per-millisecond for image generation.

Read paper

TurboDiffusion

Dec 2025

100-200x end-to-end speedup, open source on RTX 5090.

Read paper

Shortcut Models

ICLR 2025 Oral

Dynamic quality/speed tradeoff per frame — perfect for variable-complexity scenes.

Read paper

LTX-Video

Lightricks (2025)

First open-source model generating video faster than playback speed.

Read paper

CausVid

CVPR 2025

Distills bidirectional diffusion into causal streaming at 9.4fps.

Read paper

Yan — Interactive World Generation

Tencent (Aug 2025)

1080p/60fps interactive via 3D-VAE + KV-cache shift-window denoising.

Read paper

HY-World 1.5

Tencent (Dec 2025)

24fps with geometric consistency, open source, full pipeline.

Read paper

Current Insights

GameNGen achieved 20fps with a SINGLE diffusion model. With hierarchical decomposition, 60fps should be achievable.

The key insight from Genie 2: you don't need to generate every pixel from scratch every frame. Persistent elements can be cached and composited.

Speculative generation — pre-computing likely next states — could mask latency entirely.