Documentation menu

CI integration

A Jetty eval is a quality gate you can put in front of a merge. Your CI job triggers a runbook, reads the eval result off the run, and fails the build if the pass rate or score dropped below a bar you set. The same independent grader runs on every commit, so a change that quietly degrades quality stops the pipeline instead of shipping.

Two ways to call from CI

  • The SDK. Install @jetty/sdk and call runAndWait, which submits the run and polls to completion, handing back a fully-resolved trajectory. You get structured outputs to read the eval result from. See the SDK reference.
  • A raw curl. If you'd rather not add a Node dependency, hit the chat-completions runbook endpoint directly. It's a small, OpenAI-compatible surface. See the API reference.

GitHub Actions: eval as a quality gate

This job installs the SDK, runs the eval runbook with runAndWait, reads the recorded pass rate off the returned trajectory, and exits non-zero if it's under the bar. Drop it in .github/workflows/jetty-eval.yml.

name: jetty-eval
on:
  pull_request:
  push:
    branches: [main]

jobs:
  eval:
    runs-on: ubuntu-latest
    env:
      JETTY_API_TOKEN: ${{ secrets.JETTY_API_TOKEN }}
      PASS_BAR: "0.80"            # fail the build below 80% pass rate
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: "20"
      - run: npm install @jetty/sdk
      - run: node scripts/jetty-gate.mjs

The gate script

runAndWait returns the trajectory; read the eval result off it and compare to PASS_BAR. A failed comparison is a non-zero exit, which fails the CI step.

// scripts/jetty-gate.mjs
import { JettyClient } from "@jetty/sdk";

const jetty = new JettyClient();               // reads JETTY_API_TOKEN from env
const PASS_BAR = Number(process.env.PASS_BAR ?? "0.8");

const run = await jetty.runAndWait("acme", "triage-eval", {
  vars: { split: "regression" },
});

// The eval result is recorded on the trajectory.
const passRate = run.eval.passRate;            // 0..1
console.log(`pass rate: ${(passRate * 100).toFixed(1)}%  (bar: ${PASS_BAR * 100}%)`);

if (passRate < PASS_BAR) {
  console.error(`❌ eval below bar — failing the build`);
  process.exit(1);
}
console.log("✅ eval passed");

The exact shape of the eval result on the trajectory depends on whether the runbook's eval is programmatic (a pass rate) or rubric (a score). Read it the same way you read any step output; see the SDK reference for the trajectory surface.

Raw curl, if you prefer

No Node? Post to the chat-completions runbook endpoint and parse the result with jq. Same idea: compare to a bar, exit non-zero below it.

PASS_BAR=0.80
result=$(curl -s https://jetty.io/api/v1/run \
  -H "Authorization: Bearer $JETTY_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"collection":"acme","task":"triage-eval","init_params":{}}')

rate=$(echo "$result" | jq -r '.eval.pass_rate')
awk "BEGIN { exit !($rate >= $PASS_BAR) }" \
  && echo "✅ eval passed ($rate)" \
  || { echo "❌ eval below bar ($rate < $PASS_BAR)"; exit 1; }

This is regression detection

A fixed pass bar in CI is exactly the regression-detection idea from bring your own framework: set a bar, run the independent grader on every change, and flag anything that scores below it. CI enforces it automatically, so the build goes red when quality drops, before the change merges.


Want the eval to also run on a schedule, not just on commits? See scheduling routines →