Back to blog
EngineeringMar 25, 2025· min read

Teaching Sabine to Read Between the Lines: Holistic Biographical Document Processing

How we moved from rigid field extraction to context-aware biographical document understanding in Sabine.

We shipped a meaningful upgrade to how Sabine processes biographical documents. Instead of hunting for specific fields in uploaded files, Sabine now reads holistically—understanding context, relationships, and narrative structure the way a human would.

The Problem with Field-by-Field Extraction

Traditional document parsing treats biographical information like a form: find the name field, extract the date, locate the address. This works fine for structured data, but life documents—resumes, personal statements, relationship histories—don't follow templates. They tell stories.

When someone uploads a document describing their career journey or personal background to Sabine, rigid extraction misses the connective tissue. It can't distinguish between a casual mention of a city and a place that shaped someone's identity. It doesn't understand that a gap in employment might be explained three paragraphs later.

Holistic LLM Extraction

The new approach treats each biographical document as a coherent whole. We pass the full content to an LLM with instructions to extract not just facts, but context: What matters here? How do these details relate? What's the throughline?

This means Sabine can now:

• Understand implicit relationships (mentioning someone repeatedly signals importance)
• Recognize contextual significance (a year abroad versus a vacation)
• Preserve narrative structure (how experiences build on each other)
• Handle varied formats without custom parsers

The extraction process preserves ambiguity when appropriate. If a document suggests uncertainty ("I think it was 2018"), Sabine captures that nuance rather than forcing a confident timestamp.

Why This Matters

Sabine's purpose is to be a true AI partnership platform—one that understands its users deeply enough to assist meaningfully. Biographical context is foundational to that understanding. The richer Sabine's model of a user's history, preferences, and relationships, the better it can serve as a thoughtful partner rather than a reactive tool.

This change directly improves profile quality. When users share their background, Sabine now builds a more accurate, contextualized representation that informs every subsequent interaction.

What's Next

We're working on extending this approach to ongoing conversational updates. Right now, holistic extraction applies to uploaded documents; soon it will apply to organic dialogue. If a user mentions a life change in passing, Sabine should update its understanding without requiring a formal profile edit.

We're also exploring temporal reasoning—helping Sabine understand how biographical details relate across time. Not just "User worked at Company X," but "User's perspective on teamwork evolved after their experience at Company X."

The goal remains the same: make Sabine a partner that truly knows you.