Back to Lab
Research Pillar

Behavioral Verification

Continuous runtime verification derived from intent, not human-written tests. How do you know generated software is correct when there's no spec?

Current Frontier

Runtime invariant generation: automatically deriving behavioral invariants from intent specifications.

Key Questions

01

How do you verify correctness when there's no traditional spec to test against?

02

Can behavioral invariants be derived automatically from intent?

03

What does 'correct' even mean for generated software?

Key Papers

World-in-World: Closed-Loop WM Evaluation

ICLR 2026 Oral

First benchmark platform for closed-loop world model evaluation. Key finding: controllability matters more than visual quality.

Read paper

MaaG: Model as a Game

Microsoft (ICCV 2025 Workshop)

LogicNet for numerical consistency, spatial memory. Directly attacks score-tracking and physics verification.

Read paper

PIWM: Physics-Informed World Model

Sep 2025

60.6% improvement in physics consistency. Soft mask training + warm start inference.

Read paper

Critiques of World Models as Planning

Jul 2025

Systematically catalogs failure modes — shallow coherence, error explosion, generality limits.

Read paper

Current Insights

In The Last Computer, 'correct' means 'matches intent.' Verification becomes: does the generated experience do what the user wanted?

Continuous verification — checking invariants at runtime, every frame — is more powerful than traditional testing.