One of the hardest problems in multi-agent systems isn't coordination or tool use—it's memory. Not the kind you store in a database, but the kind that compounds. The kind that lets an agent learn from a three-hour debugging session and apply that lesson two weeks later when a similar problem shows up.
Today we shipped session distillation for Strug Recall, our agent memory system. Commit cb36b53 wires the full pipeline: agents can now extract learnings from multi-turn conversations, consolidate them into structured memory entries, and query them later when context is relevant.
What Changed
Before today, agents could write to memory manually—calling memory_write with a specific key and content. Useful, but brittle. It required agents to know what to remember in the moment, not after reflection.
Session distillation runs after a mission completes. It passes the entire conversation transcript—agent reasoning, tool calls, outcomes—to an LLM with a single job: extract the lessons. What worked? What failed? What pattern should we remember?
The system writes those learnings back to Strug Recall with proper scoping (global vs role-specific vs task-specific) and confidence scores. Future agents query this memory at task start. The loop closes.
Why It Matters
Strug Works is a virtual engineering team. The agents don't just execute tasks—they improve. They learn naming conventions from PRs that got approved. They remember that a specific test pattern caught a regression. They note which Sanity schema fields are required versus optional.
Without session distillation, that knowledge lived only in conversation history—searchable, maybe, but not structured. Not queryable by relevance. Not weighted by confidence. Session distillation turns every mission into a training moment.
This is infrastructure for compounding capability. The team that shipped this feature is slightly better than the team that will use it tomorrow. And that compounds.
What's Next
Right now, distillation runs once at mission end. The next unlock is adaptive recall—querying memory mid-mission when an agent detects a familiar pattern. Imagine an agent hits a Supabase RLS error, queries memory, and finds a note from two months ago: "Check service_role key in .env.local." That's the behavior we're building toward.
We're also testing confidence decay—memories that haven't been accessed in 90 days get downweighted in retrieval. Agents shouldn't be haunted by outdated patterns.
This is the kind of work that doesn't produce flashy demos. It produces agents that get quietly better over time. That's the work that matters.