Discovery as a Hidden Cost
Discovery isn't a one-time onboarding step in agent workflows; it's a recurring tax that compounds across tasks.
What counts as discovery?
In my notes, “discovery” isn’t just file search. It’s any token spent to rebuild context that is already knowable and relatively stable:
- Locating the right files to touch (or avoid).
- Re-deriving module boundaries and responsibilities.
- Re-learning naming conventions and local patterns.
- Reconfirming public APIs, invariants, and “don’t break this” constraints.
- Re-reading the same chunks of code to regain confidence.
Discovery also has a second cost that’s harder to measure than tokens: it increases the surface area for subtle misalignment. When context reconstruction is incomplete, the agent often still produces something plausible. The failure mode is rarely a crash. It’s a change that “looks right” in isolation but violates an unstated assumption, misses a neighbor constraint, or duplicates an existing mechanism.
Why not just more context?
The tempting answer is “stuff more of the repo into the prompt.” Sometimes that works, but it has awkward properties: it’s expensive, it’s difficult to audit after the fact, and it doesn’t compose well across tasks. You end up treating context as a disposable blob rather than an artifact you can inspect, refine, and reuse.
The alternative hypothesis I wanted to test was simpler: instead of giving an agent more context, give it durable context—small, local, and stable facts about the codebase that can be reused across runs. Not a semantic oracle, not a magic index. Just enough structure to reduce rediscovery and make the “what did it know?” question answerable.
That hypothesis leads naturally to a librarian’s job: maintain a catalog, keep it current, and make it cheap to
find the right shelf. In my case, that meant building a local tool—creatively named librarian—written
in Rust and backed by SQLite.
The Librarian's Stack
I built librarian to be a repo-local anchor catalog. It uses Farcaster for
agent orchestration and stores everything in a local SQLite database. The schema is
intentionally simple:
- Blobs: Content-addressed storage of file chunks. Each blob has a stable ID (SHA-256).
- Anchors: Deterministic references to code regions.
- Symbol Index: Mapping symbols (functions, structs, etc.) to paths and line ranges.
By storing this locally, the "discovery tax" is paid once during ingest, and subsequent
agent runs can query the database for pennies (in tokens) instead of dollars.
Actual agentic usage
I ran the workflow in the librarian repo itself and kept the commands
path-relative. The goal was to see if an agent could orientation itself and perform
an edit using only the "durable context" provided by the catalog.
cd librarian
# Pay the tax once
librarian ingest
# High-leverage discovery
librarian discover "apply from symbol" --limit 10
# Drill down without reading the whole file
librarian outline --path src/ops/apply.rs
librarian context src/ops/apply.rs:120 --context 12
# Precise application
librarian apply --from-symbol "src/ops/apply.rs:fn:apply_from_symbol" --stdin
The operational loop is compact and deterministic: reindex, search, locate, outline, pull small context, then edit.
Apply in practice: resilient fallback
One of the most interesting parts of building this was handling the case where the index gets stale.
In one run, the symbol_index was empty, but the agent still needed to apply a change.
The fix was a layered fallback strategy in apply.rs:
- Primary: Resolve via
symbol_idorblob_idin the index. - Secondary: If the index is stale, try to resolve directly from the
blobstable. - Tertiary: As a final safety net, infer the symbol span directly from the source file using the same parsing logic used during ingest.
This layered approach turns a hard failure into a graceful degradation. The agent can still find the "shelf" even if the librarian hasn't finished re-labeling the books.
// From src/ops/apply.rs
fn resolve_symbol_target(conn: &rusqlite::Connection, root: &str, symbol: &str) -> anyhow::Result
Beyond the Tax
Moving from "disposable context" to "durable context" changes the relationship with the agent. It stops being a black box you throw tokens at, and starts being a collaborator that shares your mental model of the codebase.
Next up: diving into how librarian handles Earth Observation UDFs in the
eo-processor repo. Because if you think discovery is hard in a Rust repo,
try doing it across 50GB of satellite imagery metadata.