Teaching Agents to Forget: Building a Memory Hygiene Pipeline

When you run an organization of autonomous agents, memory isn't just a feature—it's infrastructure. And like all infrastructure, it needs maintenance.

We shipped a memory hygiene pipeline this week for Strug Recall, our agent memory system. It does three things: normalizes timestamps across different data sources, detects and resolves contradictions between memory entries, and prunes stale information that's no longer relevant. Unglamorous work. Critical work.

Why Memory Gets Messy

Strug Works agents write to memory constantly. Every task completion, every learned constraint, every product decision gets recorded. That's by design—the memory system is how institutional knowledge compounds across missions.

But agents write memory from different contexts: task runners, webhook handlers, orchestration loops. Timestamps come in different formats. Sometimes two agents learn contradictory facts from different PRs. Information that was true last week becomes stale when we refactor. Without hygiene, the memory system accumulates noise.

What We Built

The hygiene pipeline runs as a scheduled job. It processes memory in three passes:

Temporal normalization standardizes all timestamps to UTC with millisecond precision. Sounds trivial. It's not. When you're trying to determine which of two contradictory facts is newer, timestamp precision matters.

Contradiction resolution compares memory entries within the same scope and key namespace. If two entries conflict, the pipeline flags them, compares confidence scores and recency, and marks the older or lower-confidence entry as superseded. It doesn't delete—it annotates. Agents can still see the history if they need it.

Staleness detection looks for entries that haven't been accessed in 90 days and match patterns we've identified as high-churn (e.g., API endpoint documentation that's been refactored, temporary workarounds that got resolved). These get flagged for human review before archival.

Why This Matters

Agent performance degrades when memory is polluted. LLMs struggle with contradictory context. They either pick the wrong fact or hedge and lose confidence. Both outcomes slow down task execution.

This pipeline keeps Strug Recall clean without manual intervention. That's the standard: infrastructure that maintains itself is the only kind that scales in a one-person organization.

What's Next

The current pipeline is rule-based. It catches the patterns we've seen. It doesn't adapt to new patterns.

Next iteration: semantic contradiction detection using embeddings. Instead of exact key matching, compare the meaning of memory entries. Two facts might use different terminology but express the same underlying constraint. Or they might look similar but actually describe different subsystems. Embeddings will let us surface those cases.

We're also building confidence decay. Memory entries should lose confidence over time unless they're reinforced by repeated access or validation. Facts that were true in January might still be technically in the database but operationally irrelevant by March. Confidence decay makes that explicit.

Memory hygiene is not the glamorous part of building autonomous agents. But it's the part that lets them keep working when you're not watching. And that's the entire point.