Run and evaluate AI agents
without managing infrastructure.

Ship workflows that execute, eval and iterate until they’re right.

What builders are shipping with Jetty

Automated code improvements

Cost-saving by connecting traces, get ready-to-merge PRs.

Code evaluation and benchmarking

Managed eval batches that hill-climb toward a quality bar.

Document processing

Extraction and transformation pipelines with built-in quality gates.

Design consistency

Audit a codebase or assets against a design system.

Anatomy of a runbook

A runbook is a spec for your AI agent.

One portable Markdown file: your agent’s job, what “done” looks like, and how it checks its own work before finishing.

Objective and output manifest
What the agent is doing and the exact files it must produce. The run isn't complete until every file exists.
Evaluation and iteration
How the agent checks its own work — rubric scoring or programmatic validation. If it fails, it retries with bounded iteration (typically 3 rounds).
YAML frontmatter
Version, evaluation strategy (programmatic or rubric), agent, model, and snapshot environment.
Parameters and dependencies
Template variables injected at runtime, plus tools and skills the agent needs. The runtime checks availability before execution.
Steps
Sequential plain-language instructions. Each step can run code, call tools, or invoke skills.
RUNBOOK-etl-pipeline.md
---
version: "1.0.0"
evaluation: rubric
agent: claude-code
model: claude-sonnet-4-6
snapshot: python312-uv
---

# ETL Pipeline Agent

## Objective
Fetch new events, enrich each record with
AI-generated summaries, and persist results
to the output table.

## REQUIRED OUTPUT FILES
| {{results_dir}}/validation_report.json |
| {{results_dir}}/summary.md |
| {{results_dir}}/enriched_events.csv |

## Parameters
| source_table | raw.events | Input |
| results_dir  | /app/results | Output |
| batch_size   | 100 | Records per batch |

## Step 1: Fetch Records
Query source_table for new rows since the
last checkpoint. Process in batches.

## Step 2: Enrich
For each batch, invoke the Summarizer skill
to generate a summary for each record.

## Step 3: Write Results
Persist enriched records to enriched_events.csv.

## Evaluation
| # | Criterion | 5 (Pass) | 1 (Fail) |
| 1 | Completeness | All rows enriched | Missing rows |
| 2 | Quality | Summaries coherent | Gibberish |
| 3 | Schema | Valid CSV output | Malformed |

Pass if score ≥ 4.0, no criterion below 3.

## Iteration
If evaluation fails, retry with bounded
iteration. Max 3 rounds.

Build runbooks using...

skills.sh is a portable skill manager that works across agents. One command installs the Jetty skill from the official repo. Full instructions

npx skills add https://github.com/jettyio/jettyio-skills --skill

Then export your token so the skill can authenticate:

export JETTY_API_TOKEN=mlc_your_token

Build repeatable tasks with Jetty.