Fixing What's Broken: A CI Pipeline Recovery Story

Sometimes the most critical bugs are the ones that fail silently. On March 16th, we merged a fix that addressed a fundamental issue in our content pipeline: two GitHub Actions workflows that had never successfully run.

The Problem

Our strug-blog-sync.yml and notify-progress-stream.yml workflows were referencing a Gemini action that didn't exist. Every trigger resulted in a silent failure. The impact? Twelve stream entries never made it to our content pipeline. Blog sync operations failed without alerting anyone. Our progress stream—designed to show real-time development activity—was showing a fiction.

The Fix

We replaced the phantom Gemini action with Strug Works dispatch—our autonomous agent orchestration system. This isn't just a find-and-replace fix. By routing through Strug Works, these workflows now benefit from the same intelligent task routing, retry logic, and observability that powers our entire platform.

We also corrected a configuration error in CLAUDE.md where the wrong Sanity project ID had been documented (ktfgvv39 instead of the correct prb5dynx). Documentation drift like this compounds over time—better to catch it early.

To recover the lost data, we built a backfill script that reconstructed the twelve missing stream entries. These entries now accurately reflect the work that was done but never surfaced in our progress stream.

What We Learned

Silent failures are the worst kind. Workflows that fail loudly get fixed quickly. Workflows that fail silently erode trust over time. We're now implementing health checks for all critical automation paths—not just success/failure states, but expected frequency and data integrity checks.

What's Next

This fix unblocks our entire content pipeline. With reliable CI workflows, Strug Stream will accurately reflect real-time development activity. Blog posts will sync automatically. Stream entries will flow without gaps.

We're also building better observability into our automation layer. Every workflow dispatch will now emit structured events that feed into Strug Stream, creating a self-documenting audit trail. When something breaks, we'll know immediately—not twelve entries later.

Infrastructure work like this doesn't ship flashy features. But it's the foundation that makes everything else possible. Reliable automation is how small teams ship like large ones. And catching these silent failures early is how you maintain velocity as complexity grows.