Runbooks

A runbook is the unit of work in Jetty. It's one short markdown file that tells a coding agent what to do, what “done” looks like, and how to check its own work before finishing. You write it once, and you (or your team, or a stranger who forked it) can run it a hundred times after that.

Skills + standards = runbooks. A skill is a set of instructions. A runbook adds the definition of done and a way to verify it. That second half is what lets you trust a runbook to run without you watching.

The three parts

Every runbook has three parts. Write them in any order, as short or detailed as the task needs.

The job. What you want done, written the way you'd brief a new hire.
What “done” looks like. What the finished work must contain, and what would make you send it back. These become the evals.
How to check. What the agent verifies before declaring done, and what to do if a check fails (usually: fix it and try again, a few times, then report rather than ship something broken).

What a runbook looks like

Markdown body plus a small frontmatter block. The frontmatter is where the machine-readable settings live: which runtime runs it, which model, what the inputs and primary outputs are, and how it's evaluated.

---
agent: claude-code
model: anthropic/claude-sonnet-4.6
evaluation: rubric
---

# Brand voice review

## The job
Read the uploaded draft. Check it against our brand voice guide
(uploads/brand-voice.md). Produce a marked-up draft with inline
comments, and a one-paragraph summary of the top changes needed.

## What "done" looks like
- Every paragraph has been reviewed.
- Banned words are flagged with a suggested replacement.
- The summary calls out the top three issues by frequency.
- Both files are saved in results/ and are not empty.

## How to check
- Confirm both files exist in results/ and are non-empty.
- Re-read the marked-up draft; verify every banned word has a fix.
- If any check fails, fix and recheck. Three tries max.

The full anatomy — frontmatter schema, the output manifest, parameters, and the two evaluation styles — is in Writing runbooks.

Why markdown

You can read it. Six months from now you'll still know what it does.
You can share it. Anyone can open and edit a markdown file. Nothing to install.
You can version it. Keep it in Git, see what changed, roll back.
You can hand it to any agent. A runbook written in plain English works with whichever model you point it at. You own the instructions; the model is replaceable. This is the same tech-agnostic stance behind the runtimes.

How a runbook is different from…

…a prompt. A prompt is a request (“summarize this”). A runbook is a specification: the request plus what must be in the output and how to check it.

…an agent skill. A skill tells an agent how to do something. A runbook wraps a skill with the definition of done and a verification step. Skills go inside runbooks; runbooks are what you actually run.

…a workflow. A workflow is the explicit JSON DAG of steps. A runbook is the agentic form: you describe the outcome and the agent figures out the steps inside a sandbox. They compose, since runbooks can call workflow steps, and both produce trajectories.

Where runbooks live and run

When you run a runbook, Jetty provisions an isolated sandbox, executes the chosen agent through it, and records the result as a trajectory. You can run one you wrote, one you forked from the public directory, or one deployed as a task. The runbook is the artifact that travels; everything else (the sandbox, the model, the provider) is swappable around it.

Next: write one from scratch →, or fork one that already works →