How Jetty compares to LangGraph and Anthropic Managed Agents.
Three different bets on the same problem: how do you ship an AI agent that actually finishes the job? LangGraph says write a graph in Python. Anthropic Managed Agents says let us run it for you on Claude. Jetty says the agent is a markdown runbook, and you should be able to swap the model, the agent CLI, and the runtime without rewriting it.
The short version
Each of these tools is the right answer to a different question. The trick is picking which question you're actually asking.
- LangGraph is the right answer when your work is graph-shaped: branching, parallel workers, audit-grade state transitions, and your team is comfortable maintaining a Python framework alongside the agent itself.
- Anthropic Managed Agents is the right answer when you've decided Claude is the model and you want Anthropic to run the harness — sandbox, long-running sessions, MCP wiring — and bill you for the privilege.
- Jetty is the right answer when the work is judgment-shaped — content review, document extraction, code review, eval pipelines — and you want the agent definition to be a markdown file that works on Claude today, GPT next month, and whatever ships next year.
These can compose. Jetty can run as a step inside a LangGraph or Temporal pipeline when the larger workflow really is graph-shaped. The point of this page is to help you tell which one each piece of your work actually needs.
Side-by-side
| Dimension | Jetty | LangGraph | Managed Agents |
|---|---|---|---|
| Agent definition | Markdown runbook | Python graph (code) | Agent config in Anthropic's service |
| Models supported | 100+ providers via OpenAI-compatible API | Any model (you wire it up) | Claude only |
| Agent runtimes | Claude Code, Codex, Gemini CLI | Custom Python agents | Anthropic-built harness |
| Where it runs | Managed sandbox + Temporal backend | You self-host (or LangGraph Cloud) | Anthropic-hosted |
| Durable execution | Built-in (Temporal-backed) | Built-in (checkpointing) | Built-in (long-running sessions) |
| Trajectory capture | Native, replayable, comparable across models | Via LangSmith | End-to-end tracing |
| Eval-driven quality gates | Native LLM-as-judge step type | Build your own | Build your own |
| Optimize loop | One-click runbook tuning from a failed trajectory | Not built-in | Not built-in |
| Pricing model | Pay per run; no infra to host | Self-host costs + LLM tokens | LLM tokens + $0.08/session-hour |
| Vendor portability | High — runbook is markdown | Medium — LangChain abstractions help | Low — Claude only |
Jetty vs LangGraph
What LangGraph actually is
LangGraph is a Python framework for building agents as state machines. You write nodes, you draw the edges, you ship a graph. It has durable execution via checkpointing — agents survive restarts and pick up where they left off — and deep observability through LangSmith. It hit 1.0 in late 2025 and has serious production mileage with companies like Klarna, Uber, and LinkedIn.
Where LangGraph is the right tool
When the work is genuinely graph-shaped — branching across parallel workers, complex state transitions, audit-grade replay of every transition — LangGraph gives you the right primitives. If you have engineering capacity to maintain a Python framework alongside the agent itself, the structure pays for itself.
Where it gets heavy
For work that's mostly judgment-shaped — review this PR, extract these fields, score this output — LangGraph is more framework than the job justifies. A graph in code is harder to share with a non-engineer than a markdown file. It couples the agent definition to a runtime. And to swap models or providers, you rewrite the wiring.
What Jetty does differently
The agent is a markdown file. There's no graph to wire up, no Python class coupled to a vendor SDK, no worker process to run. You hand Jetty the runbook and the input files; Jetty provisions a sandbox, installs the agent CLI of your choice, and runs the work. The trajectory comes back ready to compare across models.
We don't pretend Jetty replaces LangGraph for the work LangGraph is good at. It doesn't. It covers the work where LangGraph's weight is overkill — which, in our experience, is most of what people actually want to ship.
Jetty vs Anthropic Managed Agents
What Managed Agents actually is
Anthropic launched Managed Agents in April 2026. You define an agent — tools, guardrails, system prompt — and Anthropic runs it. Long-running sessions stretch across hours, sandboxed code execution, scoped permissions, end-to-end tracing, MCP-based connections to third-party services. Pricing is standard token rates plus $0.08 per session-hour while a session is running. Notion, Rakuten, Sentry, Asana, and Atlassian are early customers.
Where Managed Agents is the right tool
When you've already decided Claude is the model and you want Anthropic to own the runtime, Managed Agents is the cleanest path. No infrastructure to manage, first-party tracing, and Anthropic ships the harness improvements. If your product is Claude-shaped and your buyers are comfortable with that, the ergonomics are excellent.
Where the lock-in shows up
The agent definition lives in Anthropic's service. The model is Claude. If Sonnet's pricing changes, if Opus 5 ships and behaves differently, if Gemini's next release is twice as good for your workload — you're rewriting against a different vendor. The work you put into tuning your agent doesn't come with you.
You also can't read your agent definition outside Anthropic's console. It's not a file in your repo. It's not something a non-engineer can review or PR.
What Jetty does differently
The runbook is a markdown file in your repo. The model is configurable per run — Claude, GPT, Gemini, anything OpenAI-compatible. The agent CLI is configurable too. When the pricing curve shifts or a better model lands, you change one field. Same runbook, same trajectory format, same sandbox. The work survives the vendor churn.
And because Jetty speaks the OpenAI Chat Completions protocol with a small jetty extension, your existing SDK code keeps working. You add the runbook block when you want agent execution; you take it out when you want passthrough.
The thing neither of them has: one-click optimize
This is the feature we're proudest of, because it closes a loop that framework-shaped tools and vendor-shaped tools both leave open.
On every Jetty run, you get a trajectory: full step history, inputs, outputs, tokens, cost, agent logs, score. When a run goes sideways — a step fails, the judge gave it a low score, the iteration count maxed out — you click Optimize on the trajectory page.
What happens next:
- Jetty spins up a sandbox preloaded with three things: the task's current runbook, the failing trajectory's full step history, and the trajectory's original input files.
- A Claude Code agent runs
/optimize-runbooknon-interactively against those inputs, citing the specific steps that motivated each change. - You see a structured analysis and a proposed diff to your runbook, streamed live.
- You can accept, reject individual hunks, or chat with the agent to refine.
- On Accept & Re-run, the candidate runbook gets PUT back to the task (version bumped) and re-run against the original trajectory's inputs. You land on a side-by-side view: Score 58 → 76 (+18).
- If the new version isn't better, one click reverts it.
Both LangGraph and Managed Agents capture rich traces. Neither closes the loop from the trace back to the agent definition automatically. With LangGraph, the agent is code — a teammate has to read the trace, write the change, test it, and ship it. With Managed Agents, the agent definition is in Anthropic's console — same human round trip, plus a vendor in the middle.
With Jetty, the agent is a runbook file, and the loop from “trajectory says this failed” to “runbook is updated and re-run” is one click. That's how you actually get from a fragile first-pass agent to one you trust without watching.
Read more on the thinking behind this in What is a runbook? and Runbooks for Agents.
Which to pick
Pick LangGraph if you have a graph-shaped problem (branching, parallel workers, complex state) and an engineering team that's comfortable owning a Python framework. Or if your existing stack already runs LangChain and you want to stay in that ecosystem.
Pick Managed Agents if Claude is your model, your product is comfortable being Claude-shaped, and you want Anthropic to run the harness. You accept the vendor lock-in trade in exchange for not running infrastructure.
Pick Jetty if the work is judgment-shaped, you want the agent definition to live in your repo as a markdown file, you want to swap models between runs without rewriting, and you want the trajectory-to-runbook optimization loop closed for you. Jetty is the lighter primitive — and for most of the agentic work people actually ship, lighter is what fits.
These also compose. If your core pipeline is Temporal or LangGraph, Jetty can be a step inside it — the runbook handles the judgment-shaped work, the framework handles everything else.
Honest caveats
- Tool support is what it is. Claude Code, Codex, and Gemini CLI are first-class agent runtimes today. New agents get added as they emerge. If you need something exotic, ask us.
- Sandbox snapshots are opinionated. We ship Python, browser automation, and a general-purpose image. Custom snapshots are supported, but aren't a one-click experience yet.
- Jetty is primarily an API. Spot is for browsing runs and trajectories — not for building runbooks graphically. Runbooks belong in your editor, in your repo.
- Model-agnostic doesn't mean model-identical. Different models score differently on the same runbook. Jetty captures every trajectory so you can see the differences on your real work and pick what fits.
Try it on a workflow you're currently rewriting
Book a 20-minute walkthrough →
Bring a workflow you've already built on LangGraph, Managed Agents, or your own glue code. We'll show you what it looks like as a runbook — and what the optimize loop catches.
About ten minutes from sign-up to first run.
Related reading
- What is a runbook? — The format underneath everything Jetty does.
- Model- and agent-agnostic — Why portability across vendors is the load-bearing property.
- The folder is the agent — The thesis: markdown outlives runtimes.
- For builders — Technical details, API shape, what you get out of the box.