Just-in-Time SSH Certificates for Workspace VMs (k3d + Dex + Rust)

I wanted a repeatable way to SSH into ephemeral “workspace VMs” without distributing authorized_keys files everywhere or relying on long-lived keys. The result is a small stack I’ve been calling ssh-vm-flow: OIDC-backed identity, short-lived SSH certificates, a Rust signing/proxy gateway, and a control-plane API that provisions one managed SSH target per workspace.

The problem: SSH access doesn't age well

Static SSH keys are frictionless at the start and expensive later: keys get copied, access is hard to audit, and revocation usually means hunting down files. If you’re spinning up short-lived dev/analysis environments, you need access that is:

What I built

The current beta (as of 0.0.1-beta5) is split into a few small pieces:

The flow (end-to-end)

The key idea is that the user never manually edits server key files. They authenticate to Dex, then request a certificate that’s valid for a few minutes.

1) jitctl auth login        (Dex auth code + PKCE)
2) jitctl workspace set WS  (updates local config + ~/.ssh/config alias)
3) jitctl vm connect        (orchestrates everything)
   - ensure local keypair
   - PUT /v1/keys/current          (store user pubkey in Vault)
   - POST /v1/workspaces           (ensure namespace + ide-ssh-vm)
   - POST /v1/sign (workspace_id)  (gateway validates JWT + signs cert)
   - ssh to gateway :2222 using key + cert

There’s also an IDE/browser path that is being phased in: requests to /ide-jit-ssh/v1/sign are protected by OAuth2-Proxy forward-auth, so a browser can bootstrap a cert flow without a locally cached token.

How the gateway stays “fail-closed”

The gateway’s job is simple: only sign if the token is valid, then route SSH traffic to the right backend for a short window.

That “route binding” is the pragmatic compromise for TCP SSH: Traefik forwards a raw TCP stream, so the gateway needs a way to decide which backend gets the connection. In local k3d mode, the main caveat is NAT: multiple clients can appear behind one source IP, so the “last bind wins” until the TTL expires.

Traefik multi-port routing is the unsung hero

A practical requirement for this project was to expose both web APIs and SSH through one ingress layer. Traefik makes this relatively clean:

That means the client experience stays simple: the same “front door” works for both signing requests and SSH sessions.

What I’d improve next

If I keep iterating, these are the highest leverage changes:

Closing thought

SSH certificates make access feel like an API: authenticate, request a short-lived credential, and connect. Once that’s in place, provisioning ephemeral workspaces becomes much less scary because access has a clear lifetime and a clear identity story.