Stop Doing Infrastructure in HTTP Handlers
An HTTP handler is a mayfly. It is born, it runs for a moment, and then it dies—sometimes gracefully, sometimes mid-sentence. In the long run, we’re all just timeouts with better marketing.
Meanwhile, the work we ask it to do (provisioning, imports, cluster mutations, cloud calls) is… not a mayfly. It’s a three-act play with retries, partial failure, cleanup, and at least one person asking if we can "just refresh".
Rule of thumb: if the work has meaningful failure modes, don’t pretend request/response is a good container for it.
This post is a clean-room demo of a pattern I keep coming back to when I want systems to be boring in the right way:
- Intent → accept a request, validate it, and write down what should exist
- CRD → persist intent as durable state in Kubernetes
- Operator → reconcile that intent into real resources
- Status → make progress observable (not vibes-based)
You can call this "operator-first" if you want. I think it’s mostly just admitting physics exists.
The problem: “just do it in the handler”
The synchronous approach starts simple: do the work inside POST /thing, return 200
when done.
Then reality arrives:
- timeouts (client, load balancer, proxy, server)
- retries you don’t control (and usually can’t distinguish from the original request)
- partial failures (some resources created, others didn’t)
- no coherent status model (poll endpoints, scrape logs, guess)
The universe does not care that your handler timed out. It will happily deliver partial failure anyway.
The pattern: Intent → CRD → Operator → Status
Kubernetes is, at its core, a reconciliation engine: desired state in, convergence out. So if your API is doing orchestration, you can stop fighting the grain and just… stop cosplaying as a scheduler.
The API becomes a thin intent writer. The operator becomes the workhorse. Status becomes an API surface you can actually trust.
Client
|
| POST /work (intent)
v
API server
|
| creates WorkRequest CR
v
Kubernetes API (etcd)
|
| watch WorkRequest
v
Controller/Operator
|
| creates Job
v
Job runs
|
| controller mirrors Job state
v
WorkRequest.status
The demo
I built a small clean-room repo that implements this pattern using:
- a CRD called
WorkRequest - a controller that creates one Kubernetes
Jobper WorkRequest - a tiny HTTP API that returns
202 Acceptedand lets the operator do the work
Repo: github.com/bnjam/intent-crd-demo
Example CR
apiVersion: demo.demo.bnj.am/v1alpha1
kind: WorkRequest
metadata:
generateName: work-
spec:
message: "hello world"
durationSeconds: 3
shouldFail: false
Status
status:
phase: Succeeded
jobName: work-abc12-worker
conditions:
- type: JobCreated
status: "True"
reason: Created
- type: Completed
status: "True"
reason: Succeeded
Run it (operator + API in-cluster via k3d)
This setup runs both the operator and the API inside the cluster. You port-forward the API to localhost.
make demo-deploy-all IMG=intent-crd-demo:dev API_IMG=intent-crd-demo-api:dev
make api-port-forward
Then create work and watch it converge:
./scripts/demo.sh
SHOULD_FAIL=true ./scripts/demo.sh
Reset demo objects (clean screenshots):
make demo-reset
What you get for free
- Durability: intent survives restarts and transient failure
- Idempotency: reconciliation can be written to be replay-safe
- Observability: status is watchable (
kubectl get ... -w) and queryable - Separation of concerns: API is policy + validation; operator is lifecycle + orchestration
If you’ve ever built a “background worker” system and then slowly reinvented leases, state machines, retries, and status APIs: congratulations, you discovered why controllers exist. One must imagine Sisyphus happy—because at least the retry loop is idempotent.
Tradeoffs
This pattern is not free. You are building a distributed system with a state machine and a controller loop. Which sounds intimidating until you remember you were already doing that—just badly, inside HTTP handlers.
The cost is that you now need to care about:
- CRD versioning
- status design (what does “done” mean?)
- garbage collection / finalizers
- concurrency limits and backpressure
The absurdity doesn’t go away. You just put it in a place that can retry.
Closing thought
Most of platform engineering is taking chaos, naming it, and making it observable. The rest is writing enough guardrails that the next person can be productive without being a hero.
Returning 202 and reconciling later is not a downgrade. It’s admitting what the work actually is.
The meaning doesn’t arrive automatically; you manufacture it by making failure states explicit and progress
observable.