Back to blog
EngineeringJan 10, 2025· min read

What Nobody Tells You About Running an Autonomous Agent Team in Production

Six months into running Strug Works as our fully autonomous engineering team, here's what actually matters when agents ship real code to production systems—and what surprised me most.

I'm writing this from the other side of a decision that seemed insane six months ago: letting an autonomous agent team write, review, and ship production code without me touching the keyboard.

Strug Works—our autonomous engineering team—has now shipped hundreds of commits across multiple products. They've built features for Sabine (our AI partnership platform), refactored backend systems, fixed schema misalignments I didn't even know existed, and documented everything along the way. Not demos. Not proofs of concept. Production systems serving real users.

Here's what actually matters when you run an engineering organization of agents—and what caught me completely off guard.

Memory Is Not Optional—It's the Foundation

The first thing that broke at scale wasn't code quality or test coverage. It was context retention.

Early on, I'd dispatch a mission to fix a schema alignment issue. The backend agent would solve it beautifully. Two weeks later, a different agent would introduce a similar misalignment in a new feature. The learning didn't transfer. Each agent started from zero.

We built Strug Recall—our organizational memory system—not as a nice-to-have feature but as critical infrastructure. Now when an agent encounters a problem, solves it, and documents the solution, that knowledge is accessible to every other agent. The sc-backend agent's learnings about enum validation flow to the sc-content-writer when they're documenting similar features. The frontend team's discoveries about state management patterns inform new component work.

Think of it like this: a human engineering team doesn't just document in Confluence and hope someone reads it. They have hallway conversations, code reviews, and lunch discussions where context spreads organically. Agents need an equivalent mechanism. Memory isn't a feature—it's how the team learns to work together.

The Smallest Misalignments Compound Fastest

Last week we tracked down a bug that had been hiding in plain sight: a legal document ingest pipeline was using an enum value ('legal_ingest') that didn't exist in the core data model. The pipeline worked. Tests passed. But the data was effectively invisible to downstream systems.

This taught me something crucial: in an autonomous system, you can't rely on a senior engineer noticing that something 'feels wrong' during code review. Agents are excellent at following specifications but they inherit your blind spots. If your schema documentation is out of sync with your implementation, agents will confidently build on top of the inconsistency.

The fix wasn't just updating the enum. It was adding validation layers that catch these misalignments before they ship. Now when a specialized subsystem references an enum, we validate that the value exists in the core model during the build process. The agents learned to do this automatically—but only after we encoded the lesson into their operating procedures.

Documentation Becomes Mission-Critical Infrastructure

With a human team, outdated docs are annoying. With an agent team, they're existential.

We implemented a practice I call 'commit-as-you-go documentation.' Every logical unit of work—a new file, a modified function, a schema change—gets documented immediately, before the agent moves to the next task. No waiting until the end of the sprint. No 'we'll document it later.'

This had an unexpected benefit: the act of documenting forces clarity. An agent documenting a schema change has to articulate why the change matters, what downstream systems are affected, and what future developers need to know. That moment of articulation catches design issues before they propagate.

Our content writer agent has a specific instruction: for every technical fix or schema alignment, create both a stream entry (immediate visibility) and a longer blog post or doc (deeper technical context). This dual-level documentation serves different audiences—engineers who need quick facts versus future team members who need to understand the 'why.'

Business Impact Over Implementation Details

Here's where running an agent team forces you to clarify your own thinking: agents are exceptional at solving the problem you specify, but you have to specify the right problem.

We recently improved data extraction accuracy in Sabine's partnership workflow. My initial brief focused on technical accuracy—extraction precision, field validation, error rates. The agent delivered exactly that. But when we documented the change, I realized I hadn't articulated the business impact: this improvement means partnership managers spend 40% less time correcting extracted data, which means they can manage more partnerships, which means faster revenue scaling.

Now our briefing template requires both implementation goals AND business impact framing. It makes me a better product leader—and it ensures that when agents document their work, they're telling the story that matters to stakeholders, not just to other engineers.

What's Next: Teaching the Team to Teach Itself

Six months in, Strug Works ships production code daily. But we're still in the early stages of something bigger.

Right now, I'm the interface between business intent and technical execution. I write the mission briefs. I prioritize the work. I decide what matters. The agents execute brilliantly, but they're executing my vision.

The next frontier is building agents that can propose missions, not just complete them. Agents that notice patterns in customer feedback and spec new features. Agents that see a performance bottleneck and autonomously plan the optimization work. Agents that teach other agents based on their own learnings, without me as the intermediary.

We're building the foundation for that now. Strug Recall is evolving from a memory system into a learning system. The Dispatcher is gaining the ability to synthesize patterns across completed missions and suggest new ones. The agent team is learning to learn.

This isn't the future of work—it's the present, happening in production, right now. And it's messy and exhilarating and nothing like I expected.

I'm documenting every step because I don't think anyone else is running an organization quite like this. And if you're thinking about building with autonomous agents—not just building tools for them, but actually running your operations with them—I want you to know what it really looks like on the other side of that decision.

— Ryan Knollmaier, Founder & CTO, Strug City