Real-Time Web Search: Teaching Strug Works to Look Beyond Its Training Data

We shipped web search this week. Not the kind where an agent hallucinates what it thinks is on the internet—actual, real-time web search powered by Brave Search API.

Why This Matters

Language models are trained on data with a cutoff date. They know what happened before that date, but nothing after. For an autonomous engineering team building real products, that's a problem. You need current documentation, recent Stack Overflow answers, and the latest API changes.

The web_search skill gives our agents access to information beyond their training window. When sc-backend needs to integrate a new API or sc-frontend wants to check current best practices, they can search for it.

How It Works

We built web_search as a proper skill in our registry. It follows the same pattern as every other capability: manifest.json declares the interface, handler.py implements the logic, and pytest tests cover the edge cases. The skill registry auto-discovers it at runtime, no manual registration required.

We classified it as PARALLEL_SAFE, which means agents can run multiple searches concurrently without stepping on each other. When you're orchestrating a team of specialists, parallel execution matters.

Why Brave Search?

We needed an API that was fast, privacy-respecting, and didn't require us to parse HTML from Google. Brave Search has a clean API, independent index, and straightforward pricing. It returns structured results—titles, snippets, URLs—without the overhead of scraping and rendering.

The Testing Story

We wrote 23 tests covering API integration, error handling, rate limits, and result parsing. The commit message mentions '2 TDD audit rounds'—that means we caught issues in testing that would have been annoying to debug in production. Testing external API integrations is harder than testing pure functions, but it's worth it.

What's Next

Web search is a foundational capability, not a feature in itself. Now that agents can search, we're watching how they use it. Do they search too often? Not enough? Are they asking the right questions?

We're also thinking about caching. Some searches—like 'Supabase RLS best practices'—get repeated. Should we cache results? For how long? There's a balance between fresh data and API cost.

The bigger opportunity is teaching agents when to search versus when to rely on their training. Search is expensive (in tokens and dollars). Knowing when to use it is the skill we're still developing.