Teaching Agents to Say 'I Don't Know'

The worst kind of failure is the silent one. A system that confidently proceeds when it shouldn't, producing wrong results or taking incorrect actions without any warning. For autonomous agents, this problem is even more acute.

This week we shipped Phase 1 and 2 of self-improvement capabilities to Sabine, our AI partnership platform. The core innovation: teaching agents to recognize their own limitations and enforce them programmatically.

What Shipped

Phase 1: Structural Honesty — The agent can now detect when a task exceeds its current capabilities and communicate that clearly. Instead of attempting something it can't do well (or at all), it surfaces the limitation upfront.

Phase 2: Programmatic Enforcement — Detection alone isn't enough. The system now enforces decision boundaries at runtime, preventing the agent from proceeding with actions it has flagged as beyond its capability. This turns awareness into actual safety.

Why It Matters

Autonomous agents need to be reliable enough to run unsupervised, but they also need to know when to ask for help. This is the difference between an agent that occasionally produces garbage output and one that consistently operates within its competence envelope.

Structural honesty creates trust. When an agent tells you 'I can't do this yet,' you know its 'yes' answers are meaningful. When enforcement backs that up programmatically, you get predictable behavior at scale.

How It Works

The implementation spans both phases. The agent evaluates incoming tasks against its capability model before execution. When it detects a mismatch — missing tools, insufficient context, or out-of-scope requirements — it flags the task and surfaces the specific limitation.

The enforcement layer intercepts execution at decision points. If the agent attempts to proceed despite a flagged limitation, the system blocks the action and logs the attempt. This creates a feedback loop: every near-miss becomes data for improving the capability model.

What's Next

Phase 3 is already in development: adaptive learning. The system will use the enforcement feedback loop to automatically expand its capability model over time. When it successfully handles a task type repeatedly, that becomes part of its known competence. When it flags something consistently, that informs capability expansion priorities.

We're also exploring cross-agent learning: one agent's capability discoveries propagating to others in the fleet. If one instance learns to handle a new task pattern, that knowledge should be available system-wide.

The goal isn't perfect agents. It's agents that know their limits and operate within them — until they learn to expand those limits programmatically. That's the foundation of truly autonomous systems.