Ground Truth · the Jetty blog

From the blog

Notes on the tools, practices, and lessons behind shipping reliable AI systems. Subscribe on Substack

Generation Got Cheap. Verification Didn’t.

Cheaper tokens don’t make AI cheaper — they just generate errors faster than you can catch them. Verification is the real bottleneck, and closing that gap is what everything below is about.

Read on Substack

Runbooks: what agents need to hill-climb

The missing layer between “call this API” and “accomplish this outcome”: a structured markdown doc with an output manifest, evaluation criteria, and a verification gate the agent can’t talk its way past.

Read on Substack

My Backend is 442 Lines of Markdown

A real app we shipped whose entire backend is one runbook — the artifact is public. The pattern holding up in production, with expensive models authoring what cheap models execute.

Read on Substack

Visual workflows are procedural programming in a costume

Why outcome specs beat the drag-and-drop node graphs you’re probably evaluating: runbooks compose, version, and diff like code — everything a visual builder quietly gives up past ~50 nodes.

Read on Substack

Research Closes the Loop. Production Keeps Us In It.

Auto-optimization is safe against a trustworthy benchmark; in production you keep a human merge gate. Runbook changes ship like code — reviewed diffs in git — which is what keeps the verifier honest.

Read on Substack

Read the full archive on Substack