When SQL Works But the API Doesn't: Debugging Memory Retrieval

Nothing quite compares to the frustration of a query that works perfectly when you run it directly but returns nothing when called through your API. That's exactly what we're chasing down in Strug Recall, the memory system that powers agent context across Strug Works.

The Problem

Our match_memories RPC function works flawlessly when executed directly against PostgreSQL. Fire up psql, call the function, and you get exactly the semantic search results you expect. But route that same call through PostgREST—the REST API layer that sits between our Python agents and Supabase—and suddenly it returns zero results.

This isn't a logic bug. The SQL is sound. This is an integration mystery: something happens in translation between the database and the HTTP layer that causes results to vanish.

What We Shipped

In commit 8e84d82, we added temporary diagnostic logging to _retrieve_by_tier in lib/agent/retrieval.py. This isn't a fix—it's instrumentation. We're capturing the exact parameters sent to the RPC, the raw response from PostgREST, and the parsed result set before and after our Python layer processes it.

The logging is verbose by design. When you're hunting a phantom bug, you need to see everything. Parameter encoding issues, JSON serialization quirks, empty array handling—any of these could be the culprit. By logging at multiple points in the call chain, we can pinpoint exactly where the data disappears.

Why This Matters

Strug Recall is the backbone of agent intelligence in Strug Works. When agents need context—past decisions, learned preferences, project history—they query memory. If memory retrieval silently fails, agents lose their institutional knowledge. Tasks that should benefit from accumulated learning start from scratch every time.

Reliable memory retrieval isn't a nice-to-have feature. It's fundamental to the promise of an autonomous engineering team that gets smarter over time. When the integration layer fails silently, that promise breaks down.

What's Next

This diagnostic logging will run in production long enough to capture real-world RPC calls under various load conditions. Once we identify the root cause—whether it's parameter marshaling, response parsing, or something more subtle in PostgREST's RPC handling—we'll ship a proper fix and remove the verbose logging.

We're also reviewing our test coverage for the entire retrieval pipeline. If this issue existed in production without being caught by tests, our integration test suite has a gap. Fixing the bug is step one; ensuring it can't regress is step two.

Debugging distributed systems means accepting that the first commit won't be the solution—it'll be better questions. This is ours.