Agent Infrastructure

Forgiveness, Not Permission: Running Agents On Production Data

As coding assistants ramp up adoption, the data industry’s focus is rapidly shifting on making agents trustworthy by sacrificing capabilities for trustworthiness. We argue that this is the wrong problem to solve, as systems should not need to trust agents, but be robust to mistakes and correct under concurrency. To that end, we introduce a correct-by-design lakehouse, where illegal states are (provably) unrepresentable: ill-typed pipelines should not be planned, inconsistent plans should not be run, failed runs should not be published. This results in agentic infrastructure with correctness guarantees for humans and agents, by combining Git-like APIs with MVCC-style transactions: data changes are always immutable but never fatal. We conclude by sharing best practices from the trenches for automation in data engineering, and outlining open challenges as we re-think OLAP infrastructure for agentic AI.

Speakers