Agent Systems Fail Quietly: Why Orchestration Matters More Than Intelligence
Most agent systems don’t fail because models are weak — they fail because coordination is underspecified, and the failures are silent.
The quiet failure
I asked an “agent” to do a small, boring refactor: rename a function, update call sites, run tests, and commit the change.
Halfway through, the run timed out. I re-ran it.
Nothing crashed. Nothing threw an error. The repo even looked fine at a glance.
But the edit had been applied twice in one file, once in another, and a downstream task later failed because reality had drifted: function signatures no longer matched what the next task assumed.
The dangerous part wasn’t the mistake.
The dangerous part was that it happened quietly.
Smarter agents don’t solve coordination
Models are improving. Tooling is improving. Prompting patterns are improving.
But coordination bugs don’t disappear with better reasoning because they’re not “thinking problems”. They’re failure problems:
- retries after timeouts
- workers crashing mid-task
- partial progress and ambiguous state
- concurrent edits and resource contention
- quota cutoff on set model
- “did the effect happen?” uncertainty
No amount of intelligence makes a subprocess transactional by default.
This is distributed systems déjà vu
Agent workflows are distributed systems whether we admit it or not.
- Agents are fallible workers.
- Prompts are jobs.
- Outputs are messages.
- Applying output is a side effect.
Distributed systems don’t fail politely: messages replay, processes die at inconvenient times, and “exactly-once” is mostly marketing shorthand.
What’s new is not the coordination problem. What’s new is that we’re now trying to apply these workflows to codebases, infrastructure, and datasets where silent drift is expensive.
Orchestration is the missing layer
Orchestration isn’t glamorous. It’s the part nobody demos.
It’s also the part that makes agent workflows survivable:
- Durable state (not “in-memory vibes”)
- Explicit dependencies (DAGs, not hope)
- Leases + heartbeats (ownership is rented, not assumed)
- Retries with memory (and tombstones for post-mortems)
- Audit logs (so you can reconstruct what happened)
- Human visibility and intervention points
Agents propose. Systems decide.
That separation is the difference between “agent automation” and an actual system.
Lessons from building Farcaster (a practical example)
I’ve been building a personal orchestration project called Farcaster: a small multi-agent orchestration system for code and workflow tasks. It’s not meant to be a product pitch — it’s the place I’ve been stress-testing the boring realities of “agentic” workflows.
Three implementation lessons kept repeating themselves:
1) Treat agent output as data, not actions
Farcaster stores structured agent outputs durably, then processes them in a separate step. That way, if the system crashes, you can replay interpretation safely — or at least know exactly what was emitted.
// safer (proposal-based) shape
output = run_agent(job)
store_durably(job_id, output) // append-only record
enqueue_for_processing(output_id) // interpretation is separate
2) Make ownership explicit (leases + heartbeats)
A worker doesn’t “own” a job forever; it leases it. If it stops heartbeating, the lease expires and another worker can reclaim the job. This avoids the “dead worker holds the lock forever” problem without manual babysitting.
// pseudo-lease model
job = claim_next_job(worker_id, lease_for=60s)
while running(job):
heartbeat(job, worker_id, extend_lease=60s)
finish(job) // success or failure recorded durably
3) Preserve history (tombstones and events)
When something fails repeatedly, you need more than a final status. Farcaster keeps event trails and “tombstones” for dead jobs so failures remain inspectable after cleanup. Otherwise you just accumulate a graveyard of mysteries.
The recurring theme: coordination needs durable memory. Without it, retries become corruption.
Farcaster isn’t presented as a solution to adopt — it’s the environment where these constraints became unavoidable.
What changes when outputs become proposals
When you treat agent output as a proposal, you unlock a bunch of “boring” safety properties:
- Replay safety: outputs can be reprocessed after a crash.
- Deduplication: repeated outputs can be detected and ignored.
- Auditing: you can trace who/what proposed a change.
- Human gates: approvals can sit between proposal and effect.
- Resumption: the system can resume without guessing.
Traditional:
agent_output -> side_effects
Proposal-based:
agent_output -> durable_record -> orchestration -> side_effects
This is not about distrusting agents. It’s about making the system robust to the basic truth that failures happen.
Why this matters now
Agents are cheap to spawn, so we spawn lots of them.
That increases concurrency. Concurrency increases failure frequency. And failures without orchestration increase silent drift.
Coordination problems scale faster than intelligence.
Closing
This isn’t an anti-AI post. It’s not a model critique. And it’s not a framework announcement.
It’s a systems warning: once agent workflows touch real code or real data, orchestration matters more than cleverness.