How to build an AI agent that actually works when you're not watching.

A guide for people who already know how useful AI can be, and want to turn one of their workflows into something reliable.


Before you start

This is not a tutorial about how to write a perfect prompt. There are a thousand of those, and they all have the same problem: a perfect prompt still fails the moment someone else runs it, or the model changes, or the input is weird.

This is about how to build something that keeps working. For your team. Over time. Without you in the room.

The short version: write down what you want, write down what “done” looks like, add a small check, run it, fix what's broken. Repeat. That's an AI agent you can trust.

Here are the steps.


Step 1 — Pick something you already do.

Not something impressive. Not something novel. Something you do every week and find tedious.

Good candidates:

  • The brand voice review on every piece of marketing content.
  • The weekly report where you pull the same five metrics from the same three tools.
  • The client-brief intake where you take a form and turn it into a project plan.
  • The pre-meeting prep where you summarize the last three email threads with a customer.

The test is: if you stopped doing this task, would someone on your team have to learn it the hard way? If yes, it's a good candidate. You already know the task; the runbook just has to capture what you know.

Bad candidates for your first runbook:

  • Something you've never done before.
  • Something that changes every time.
  • Something where the “right answer” depends on taste alone.

Start with a task that has a shape. You can graduate to the judgment calls later.


Step 2 — Write it down as if for a new hire.

Open a blank document. Describe the task. The way you'd describe it to someone joining your team tomorrow who has to do it without you hovering.

Write what you'd read in an SOP. Not “make it good” — which tool, which file, what order, what tone.

A short example:

Weekly competitive summary

Every Monday, produce a one-page summary of what our top five competitors shipped the prior week.

For each competitor:

  • Check their blog and their changelog
  • Pull any feature launches, pricing changes, or executive announcements
  • Note the source URL

Produce a single page with one short paragraph per competitor. No filler, no speculation. If a competitor didn't ship anything, write “no update.”

This is not a prompt. It's a briefing note. The AI will read this as its job.


Step 3 — Name what “done” looks like.

Here's the part most people skip. An AI will do the task and declare itself finished. What you need is a checklist of what makes the task actually finished.

For the competitive summary:

Done means:

  • All five competitors are covered, in the same order every week.
  • Every claim has a source URL.
  • If a competitor has no update, the summary says so explicitly (it doesn't just omit them).
  • The summary fits on one page.
  • The tone is neutral — no adjectives, no speculation.

Be specific. Be grumpy. Write down the things that would make you reject the output if a coworker handed it to you. Those are your standards.

If you can't say what “done” looks like, your first run will tell you. Read the output. Look at what's wrong. That's your standard. Add it to the runbook.


Step 4 — Add one small check.

Before the AI declares done, it should confirm it actually hit the standards. This is the part that turns a one-shot prompt into something reliable.

The checks don't have to be fancy. For the competitive summary, they might be:

Before declaring done:

  • Confirm all five competitor sections are present.
  • Confirm every “launch” claim has a URL.
  • Confirm no section is empty.
  • If any check fails, fix the output and check again. Three tries max.

If something fails the check, the AI tries again. If it can't fix it after three tries, it stops and tells you. It doesn't quietly ship something wrong.

This is where trust comes from. Not from the AI being perfect — from the process catching mistakes before you do.


Step 5 — Run it and see what happens.

You're not going to write a perfect runbook on the first try. Nobody does.

Run it. Read the output. Find the one thing that's off — usually something small, like the tone is too formal, or a section is missing because the URL 404'd and nobody said what to do in that case.

Now fix the runbook, not the AI. Add the rule you forgot. Say what to do when a URL is dead.

Run it again. It will be better. Run it a third time. Somewhere around the third or fourth run, the runbook will be good enough that you'd trust it without watching.

This is the most important thing on this page:

You don't build a reliable AI agent by writing a perfect prompt. You build it by running an imperfect one, watching where it falls short, and updating the instructions.

Every run makes the runbook better. Every week you use the runbook, you find one more thing to tighten. After a quarter, the runbook is better than you were when you started — because it has everyone's lessons baked in.


Step 6 — Share it.

Put the runbook somewhere your team can see it. Start with the one coworker who'd benefit most. Let them run it. Watch what happens when someone who isn't you uses it.

They'll find things you didn't. “It crashes when the blog URL redirects.” “It missed that one competitor renamed themselves.” Those are the lessons that turn a solo runbook into a shared one. Add them.

A year from now, you'll have a folder of runbooks. Each one is someone's workflow, captured, checked, and runnable by anyone on the team. That's an AI operating system for your organization. It didn't come from a consulting engagement. It came from you writing down what you already knew.


A note on trust

Trust between a person and an AI gets built the same way trust between two coworkers does: by making mistakes and fixing them. You'll write a runbook. It will mess something up. You'll see where. You'll update it. It won't mess that thing up again.

After enough of those cycles, you stop watching. That's the win.

This is why Jetty leans into “run, check, fix, rerun” instead of promising the AI will be right the first time. Any system that claims the AI is always right is either lying to you or not running the AI enough to notice.


Ready to start?

Start your first runbook →

Bring a real task from your week. We'll turn it into a runbook together. You don't need to know the format — we'll walk you through it.

Book a 20-minute walkthrough →

If you'd rather pair with Jon on your first one, pick a time.


Related reading