Research Pillar

Behavioral Verification

Continuous runtime verification derived from intent, not human-written tests. How do you know generated software is correct when there's no spec?

Current Frontier

Runtime invariant generation: automatically deriving behavioral invariants from intent specifications.

How do you verify correctness when there's no traditional spec to test against?

Can behavioral invariants be derived automatically from intent?

What does 'correct' even mean for generated software?

ICLR 2026 Oral

First benchmark platform for closed-loop world model evaluation. Key finding: controllability matters more than visual quality.

Microsoft (ICCV 2025 Workshop)

LogicNet for numerical consistency, spatial memory. Directly attacks score-tracking and physics verification.

Sep 2025

60.6% improvement in physics consistency. Soft mask training + warm start inference.

Jul 2025

Systematically catalogs failure modes — shallow coherence, error explosion, generality limits.

In The Last Computer, 'correct' means 'matches intent.' Verification becomes: does the generated experience do what the user wanted?

Continuous verification — checking invariants at runtime, every frame — is more powerful than traditional testing.