How Jetty compares to LangGraph and Anthropic Managed Agents.

Three different bets on the same problem: how do you ship an AI agent that actually finishes the job? LangGraph says write a graph in Python. Anthropic Managed Agents says let us run it for you on Claude. Jetty says the agent is a markdown runbook, and you should be able to swap the model, the agent CLI, and the runtime without rewriting it.


The short version

Each of these tools is the right answer to a different question. The trick is picking which question you're actually asking.

  • LangGraph is the right answer when your work is graph-shaped: branching, parallel workers, audit-grade state transitions, and your team is comfortable maintaining a Python framework alongside the agent itself.
  • Anthropic Managed Agents is the right answer when you've decided Claude is the model and you want Anthropic to run the harness — sandbox, long-running sessions, MCP wiring — and bill you for the privilege.
  • Jetty is the right answer when the work is judgment-shaped — content review, document extraction, code review, eval pipelines — and you want the agent definition to be a markdown file that works on Claude today, GPT next month, and whatever ships next year.

These can compose. Jetty can run as a step inside a LangGraph or Temporal pipeline when the larger workflow really is graph-shaped. The point of this page is to help you tell which one each piece of your work actually needs.


Side-by-side

DimensionJettyLangGraphManaged Agents
Agent definitionMarkdown runbookPython graph (code)Agent config in Anthropic's service
Models supported100+ providers via OpenAI-compatible APIAny model (you wire it up)Claude only
Agent runtimesClaude Code, Codex, Gemini CLICustom Python agentsAnthropic-built harness
Where it runsManaged sandbox + Temporal backendYou self-host (or LangGraph Cloud)Anthropic-hosted
Durable executionBuilt-in (Temporal-backed)Built-in (checkpointing)Built-in (long-running sessions)
Trajectory captureNative, replayable, comparable across modelsVia LangSmithEnd-to-end tracing
Eval-driven quality gatesNative LLM-as-judge step typeBuild your ownBuild your own
Optimize loopOne-click runbook tuning from a failed trajectoryNot built-inNot built-in
Pricing modelPay per run; no infra to hostSelf-host costs + LLM tokensLLM tokens + $0.08/session-hour
Vendor portabilityHigh — runbook is markdownMedium — LangChain abstractions helpLow — Claude only

Jetty vs LangGraph

What LangGraph actually is

LangGraph is a Python framework for building agents as state machines. You write nodes, you draw the edges, you ship a graph. It has durable execution via checkpointing — agents survive restarts and pick up where they left off — and deep observability through LangSmith. It hit 1.0 in late 2025 and has serious production mileage with companies like Klarna, Uber, and LinkedIn.

Where LangGraph is the right tool

When the work is genuinely graph-shaped — branching across parallel workers, complex state transitions, audit-grade replay of every transition — LangGraph gives you the right primitives. If you have engineering capacity to maintain a Python framework alongside the agent itself, the structure pays for itself.

Where it gets heavy

For work that's mostly judgment-shaped — review this PR, extract these fields, score this output — LangGraph is more framework than the job justifies. A graph in code is harder to share with a non-engineer than a markdown file. It couples the agent definition to a runtime. And to swap models or providers, you rewrite the wiring.

What Jetty does differently

The agent is a markdown file. There's no graph to wire up, no Python class coupled to a vendor SDK, no worker process to run. You hand Jetty the runbook and the input files; Jetty provisions a sandbox, installs the agent CLI of your choice, and runs the work. The trajectory comes back ready to compare across models.

We don't pretend Jetty replaces LangGraph for the work LangGraph is good at. It doesn't. It covers the work where LangGraph's weight is overkill — which, in our experience, is most of what people actually want to ship.


Jetty vs Anthropic Managed Agents

What Managed Agents actually is

Anthropic launched Managed Agents in April 2026. You define an agent — tools, guardrails, system prompt — and Anthropic runs it. Long-running sessions stretch across hours, sandboxed code execution, scoped permissions, end-to-end tracing, MCP-based connections to third-party services. Pricing is standard token rates plus $0.08 per session-hour while a session is running. Notion, Rakuten, Sentry, Asana, and Atlassian are early customers.

Where Managed Agents is the right tool

When you've already decided Claude is the model and you want Anthropic to own the runtime, Managed Agents is the cleanest path. No infrastructure to manage, first-party tracing, and Anthropic ships the harness improvements. If your product is Claude-shaped and your buyers are comfortable with that, the ergonomics are excellent.

Where the lock-in shows up

The agent definition lives in Anthropic's service. The model is Claude. If Sonnet's pricing changes, if Opus 5 ships and behaves differently, if Gemini's next release is twice as good for your workload — you're rewriting against a different vendor. The work you put into tuning your agent doesn't come with you.

You also can't read your agent definition outside Anthropic's console. It's not a file in your repo. It's not something a non-engineer can review or PR.

What Jetty does differently

The runbook is a markdown file in your repo. The model is configurable per run — Claude, GPT, Gemini, anything OpenAI-compatible. The agent CLI is configurable too. When the pricing curve shifts or a better model lands, you change one field. Same runbook, same trajectory format, same sandbox. The work survives the vendor churn.

And because Jetty speaks the OpenAI Chat Completions protocol with a small jetty extension, your existing SDK code keeps working. You add the runbook block when you want agent execution; you take it out when you want passthrough.


The thing neither of them has: one-click optimize

This is the feature we're proudest of, because it closes a loop that framework-shaped tools and vendor-shaped tools both leave open.

On every Jetty run, you get a trajectory: full step history, inputs, outputs, tokens, cost, agent logs, score. When a run goes sideways — a step fails, the judge gave it a low score, the iteration count maxed out — you click Optimize on the trajectory page.

What happens next:

  1. Jetty spins up a sandbox preloaded with three things: the task's current runbook, the failing trajectory's full step history, and the trajectory's original input files.
  2. A Claude Code agent runs /optimize-runbook non-interactively against those inputs, citing the specific steps that motivated each change.
  3. You see a structured analysis and a proposed diff to your runbook, streamed live.
  4. You can accept, reject individual hunks, or chat with the agent to refine.
  5. On Accept & Re-run, the candidate runbook gets PUT back to the task (version bumped) and re-run against the original trajectory's inputs. You land on a side-by-side view: Score 58 → 76 (+18).
  6. If the new version isn't better, one click reverts it.

Both LangGraph and Managed Agents capture rich traces. Neither closes the loop from the trace back to the agent definition automatically. With LangGraph, the agent is code — a teammate has to read the trace, write the change, test it, and ship it. With Managed Agents, the agent definition is in Anthropic's console — same human round trip, plus a vendor in the middle.

With Jetty, the agent is a runbook file, and the loop from “trajectory says this failed” to “runbook is updated and re-run” is one click. That's how you actually get from a fragile first-pass agent to one you trust without watching.

Read more on the thinking behind this in What is a runbook? and Runbooks for Agents.


Which to pick

Pick LangGraph if you have a graph-shaped problem (branching, parallel workers, complex state) and an engineering team that's comfortable owning a Python framework. Or if your existing stack already runs LangChain and you want to stay in that ecosystem.

Pick Managed Agents if Claude is your model, your product is comfortable being Claude-shaped, and you want Anthropic to run the harness. You accept the vendor lock-in trade in exchange for not running infrastructure.

Pick Jetty if the work is judgment-shaped, you want the agent definition to live in your repo as a markdown file, you want to swap models between runs without rewriting, and you want the trajectory-to-runbook optimization loop closed for you. Jetty is the lighter primitive — and for most of the agentic work people actually ship, lighter is what fits.

These also compose. If your core pipeline is Temporal or LangGraph, Jetty can be a step inside it — the runbook handles the judgment-shaped work, the framework handles everything else.


Honest caveats

  • Tool support is what it is. Claude Code, Codex, and Gemini CLI are first-class agent runtimes today. New agents get added as they emerge. If you need something exotic, ask us.
  • Sandbox snapshots are opinionated. We ship Python, browser automation, and a general-purpose image. Custom snapshots are supported, but aren't a one-click experience yet.
  • Jetty is primarily an API. Spot is for browsing runs and trajectories — not for building runbooks graphically. Runbooks belong in your editor, in your repo.
  • Model-agnostic doesn't mean model-identical. Different models score differently on the same runbook. Jetty captures every trajectory so you can see the differences on your real work and pick what fits.

Try it on a workflow you're currently rewriting

Book a 20-minute walkthrough →

Bring a workflow you've already built on LangGraph, Managed Agents, or your own glue code. We'll show you what it looks like as a runbook — and what the optimize loop catches.

Write your first runbook →

About ten minutes from sign-up to first run.


Related reading