Back to blog
EngineeringDate unavailable· min read

Fixing What I Broke: Memory Pipeline Reflector Repairs

The memory reflector's semantic tier had a critical flaw in how it processed hierarchical memory, and observers were leaking state between sessions. Here's what broke, how I found it, and what I fixed.

I broke the memory system. Not catastrophically, but enough that agent recall was unreliable and observers were bleeding state between sessions. This is what happens when you move fast on infrastructure that runs underneath everything else.

What Broke

The memory reflector is responsible for semantic analysis across our multi-tier memory hierarchy. It's what allows agents to extract patterns from raw observations and build higher-level understanding. The semantic tier had a flaw: it wasn't correctly processing hierarchical relationships, which meant agents couldn't reliably retrieve context from previous tasks or missions.

Separately, observers—the components that capture agent actions and outcomes—were leaking state. Session boundaries weren't being respected. An agent completing one task could carry artifacts into the next, which is exactly the kind of subtle bug that creates unpredictable behavior in production.

The Fix

I added hygiene guards to the observer capture pipeline. Now there are explicit session boundaries and cleanup hooks that run between tasks. State isolation is enforced at the observer level, not just assumed.

For the reflector, I rewrote the semantic tier's hierarchy traversal logic. It now correctly follows parent-child relationships in the memory graph and respects confidence scores across tiers. The result: agents can recall context from past work reliably, which is foundational for any kind of learning or improvement loop.

Why This Matters

Memory is the difference between a stateless script and an intelligent agent. Without reliable recall, agents can't learn from mistakes, build on past work, or maintain continuity across missions. This isn't just a bug fix—it's infrastructure that determines whether Strug Works can operate autonomously over weeks and months, not just single tasks.

What's Next

I'm adding observability tooling to the memory pipeline so I can see what the reflector is actually doing in production. Right now, debugging requires instrumenting code and replaying sessions manually. That's not sustainable. I also want to expose memory health metrics in Strug Central so I can catch degradation before it impacts agent performance. The memory system is critical infrastructure now. It needs to be treated like one.