Ground Truth · the Jetty blog
From the blog
Notes on the tools, practices, and lessons behind shipping reliable AI systems. Subscribe on Substack

Generation Got Cheap. Verification Didn’t.
Cheaper tokens don’t make AI cheaper — they just generate errors faster than you can catch them. Verification is the real bottleneck, and closing that gap is what everything below is about.
Read on Substack
Runbooks: what agents need to hill-climb
The missing layer between “call this API” and “accomplish this outcome”: a structured markdown doc with an output manifest, evaluation criteria, and a verification gate the agent can’t talk its way past.
Read on Substack
My Backend is 442 Lines of Markdown
A real app we shipped whose entire backend is one runbook — the artifact is public. The pattern holding up in production, with expensive models authoring what cheap models execute.
Read on Substack
Visual workflows are procedural programming in a costume
Why outcome specs beat the drag-and-drop node graphs you’re probably evaluating: runbooks compose, version, and diff like code — everything a visual builder quietly gives up past ~50 nodes.
Read on Substack
Research Closes the Loop. Production Keeps Us In It.
Auto-optimization is safe against a trustworthy benchmark; in production you keep a human merge gate. Runbook changes ship like code — reviewed diffs in git — which is what keeps the verifier honest.
Read on Substack