Run and evaluate AI agents
without managing infrastructure.
Ship workflows that execute, eval and iterate until they’re right.
What builders are shipping with Jetty
Automated code improvements
Cost-saving by connecting traces, get ready-to-merge PRs.
Code evaluation and benchmarking
Managed eval batches that hill-climb toward a quality bar.
Document processing
Extraction and transformation pipelines with built-in quality gates.
Design consistency
Audit a codebase or assets against a design system.
Anatomy of a runbook
A runbook is a spec for your AI agent.
One portable Markdown file: your agent’s job, what “done” looks like, and how it checks its own work before finishing.
Objective and output manifest
What the agent is doing and the exact files it must produce. The run isn't complete until every file exists.
Evaluation and iteration
How the agent checks its own work — rubric scoring or programmatic validation. If it fails, it retries with bounded iteration (typically 3 rounds).
YAML frontmatter
Version, evaluation strategy (programmatic or rubric), agent, model, and snapshot environment.
Parameters and dependencies
Template variables injected at runtime, plus tools and skills the agent needs. The runtime checks availability before execution.
Steps
Sequential plain-language instructions. Each step can run code, call tools, or invoke skills.
RUNBOOK-etl-pipeline.md
---
version: "1.0.0"
evaluation: rubric
agent: claude-code
model: claude-sonnet-4-6
snapshot: python312-uv
---
# ETL Pipeline Agent
## Objective
Fetch new events, enrich each record with
AI-generated summaries, and persist results
to the output table.
## REQUIRED OUTPUT FILES
| {{results_dir}}/validation_report.json |
| {{results_dir}}/summary.md |
| {{results_dir}}/enriched_events.csv |
## Parameters
| source_table | raw.events | Input |
| results_dir | /app/results | Output |
| batch_size | 100 | Records per batch |
## Step 1: Fetch Records
Query source_table for new rows since the
last checkpoint. Process in batches.
## Step 2: Enrich
For each batch, invoke the Summarizer skill
to generate a summary for each record.
## Step 3: Write Results
Persist enriched records to enriched_events.csv.
## Evaluation
| # | Criterion | 5 (Pass) | 1 (Fail) |
| 1 | Completeness | All rows enriched | Missing rows |
| 2 | Quality | Summaries coherent | Gibberish |
| 3 | Schema | Valid CSV output | Malformed |
Pass if score ≥ 4.0, no criterion below 3.
## Iteration
If evaluation fails, retry with bounded
iteration. Max 3 rounds.Build runbooks using...
skills.sh is a portable skill manager that works across agents. One command installs the Jetty skill from the official repo. Full instructions
npx skills add https://github.com/jettyio/jettyio-skills --skillThen export your token so the skill can authenticate:
export JETTY_API_TOKEN=mlc_your_token