v3.0: Teaching Agents to Remember—and Making Sure They Get It Right

Autonomous agents are only as good as their ability to learn from what they've done before. Strug Works v3.0 introduces two foundational capabilities that make our agent platform genuinely intelligent: a persistent memory system that enables learning across executions, and quality infrastructure that ensures agents stay reliable as they grow smarter.

The Memory Problem

Until now, every agent execution started from scratch. An agent might solve a complex problem—figure out a project structure, learn a team's conventions, debug a tricky integration—and then forget it all the moment the task ended. The next agent to tackle a similar problem would repeat the same discovery process.

This wasn't just inefficient. It prevented agents from developing the kind of institutional knowledge that makes human teams effective. We needed agents to remember—but memory at scale introduces new challenges.

What We Built

v3.0's memory system (now live in Strug Recall) gives every agent access to a shared knowledge base with scope-based organization. Memories can be global (available to all agents), role-specific (sc-backend remembers backend patterns), or task-specific (context for a single execution).

Each memory carries a confidence score and timestamp. Agents automatically surface the most relevant, highest-confidence memories when making decisions. Memories decay over time unless they're reinforced by repeated access—mimicking how human teams forget outdated practices and retain useful patterns.

The technical implementation uses Supabase for storage with indexed queries on scope, confidence, and recency. Agents call memory_read and memory_write functions that handle retrieval ranking and automatic timestamp updates. It's simple by design—we wanted agents to use memory naturally, without complex prompting.

Quality Infrastructure

Memory makes agents smarter, but it also introduces risk. If an agent remembers incorrect information, that mistake can propagate across executions. v3.0 pairs memory with a quality and evaluation framework designed to catch problems before they compound.

Every pull request now requires test evidence in the description—human-readable proof that the code works. Agents run automated checks before committing. We've instrumented evaluation pipelines that measure output consistency, success rates, and adherence to coding standards across agent roles.

These aren't aspirational metrics. They're enforced in CI and visible in Strug Central's Dashboard. When an agent's quality score drops, the system flags it for investigation. We're treating agent reliability with the same rigor we'd apply to any production service.

What's Next

v3.0 establishes the foundation. The next phase focuses on memory intelligence: automatic pattern extraction from successful executions, cross-agent learning (when sc-frontend solves a problem, sc-backend can learn from it), and memory-driven task routing (assigning tasks to agents based on their accumulated expertise).

We're also expanding quality infrastructure to include output diff analysis, automated regression detection, and agent-generated improvement suggestions. The goal is a system that not only remembers but actively improves its own performance over time.

This is what makes autonomous agents genuinely useful in production: the ability to learn, remember, and get better—while staying reliably correct.