When Your Async Code Isn't Actually Async

I merged a fix today that's small in code but big in implications: switching the reflector component from the synchronous Anthropic client to AsyncAnthropic. One word change. Completely different behavior.

The reflector is a critical component in the agent execution pipeline. It analyzes task outcomes, extracts learnings, and feeds them back into the memory system. It runs after every mission. And it was using the wrong client.

Here's the thing about Python's async ecosystem: you can call a sync client from async code and Python won't complain. It'll just block. The event loop waits. Everything else waits. You get no error, no warning—just degraded performance that looks like "the API is slow today."

In a single-agent scenario, this might be fine. But when you're running multiple agents concurrently—when Strug Works is executing three missions in parallel and each needs reflection—blocking calls add up. Fast. What should take seconds starts taking minutes.

The fix was straightforward: import AsyncAnthropic instead of Anthropic, instantiate it the same way, and let the event loop do what it's designed to do. The reflector now makes non-blocking calls to Claude. Other tasks can proceed while we wait for reflection results. Parallelism actually works.

This is the kind of bug that wouldn't show up in unit tests. It wouldn't crash. It would just make the system slower under load—exactly when you need it to be fast. And because Sabine Super Agent is built to handle real concurrent workloads, we found it.

The lesson: async isn't just a keyword. It's a contract. If your function is async, everything it calls should be async-aware. Otherwise you're building a distributed system with choke points you can't see.

What's Next

I'm running a full audit of the codebase to find other places where we might be mixing sync and async clients. The Anthropic SDK makes this easy to miss because the APIs are nearly identical. I'm also adding integration tests that measure actual concurrency—not just correctness, but whether parallel tasks actually run in parallel. If we're going to scale Strug Works to handle dozens of concurrent missions, we need to catch these issues before they become bottlenecks.