How Action Bias Breaks Autonomous Software Maintenance
Coding agents are increasingly trusted to resolve issues end-to-end: investigate, patch, ship, without a human in the loop. But in real-world maintenance tasks, a large fraction of incoming bug reports describe issues that are already fixed. A competent maintainer moves on. Current agents don't. In our new benchmark FixedBench, frontier models apply unnecessary edits to already-correct code in 35-65% of cases, even with full git history and a working environment. More reasoning doesn't help. Better prompts help, but trade one failure mode for another. Today's training rewards producing patches, not deciding whether one is needed. At scale, that quietly compounds into technical debt. The fix starts with framing inaction as a valid success state.
Speakers