Stop prompting agents. Write the loop that prompts them. Make "done" mean converged, not claimed.
loops is a small, nestable library for running an agent in a convergence loop. The loop finds the work, hands it to an agent, checks the result, records what it learned, and goes again until a gate you define says the work is finished. You write the loop once and it drives the agent, rather than prompting the agent by hand. Compose loops and DAGs both ways, run them against any model behind a one-method Engine, and watch a run in a live terminal UI.
Every iteration runs with a fresh context, so a long run never rots. Progress accumulates in git, not the chat transcript: the agent forgets between turns, the repository does not. The loop stops only when an honest gate clears, a deterministic check (the tests genuinely pass) alongside a separate judge in its own context, so the model that did the work is never the one that grades it. The gate is the core idea. It keeps a loop from declaring itself finished on a half-built job and spending tokens with nothing to show.
Where most "agent memory" recalls a conversation, this keeps your decisions consistent across long work. No vector database, no embeddings, no index to sync or let go stale. Git is the memory.
A downstream agent had to preserve one upstream decision: snapshots must start
with the exact wire tag SSv1|. That decision lived only in a git commit body,
not in the source files or the downstream task prompt. The commit was not just a
fact store; it was the thread back through the journey, what was decided, why it
was decided, and what downstream work had to honour.
| Runner | What it could read | Result |
|---|---|---|
| Memoryless graph | files plus task prompt | 0/10 preserved the contract |
| Loops Ledger | gated commit bodies plus grounding | 9/10 preserved the contract |
| Raw git dump | full git log pasted into every prompt | 10/10 on a toy log, not a real-repo operating mode |
That is the honest shape of the claim. Loops is not just git log: it is the
deterministic enforcement layer that makes agents write useful commit bodies when
work converges, then the grounding layer that reads those verified reasons back
into later fresh contexts. The value is not bare recall. A fresh agent can pull
on one thread and reconstruct how and why the repository got here. Full-log dump
is a useful sanity check on tiny histories, but on a repo with significant
history it is context rot and cost.
npm run bench:compareimport { loop, agentJob, commandSucceeds, agentCheck } from '@loops-adk/core';
// Keep working until the tests pass AND a judge agrees it matches intent.
export default loop({
name: 'build-feature',
max: 20,
body: agentJob({
prompt: (c) => `Iteration ${c.iteration}: make concrete progress on TASK.md.`,
ground: true, // read the commit log + this run's scratch files before working
}),
until: [
commandSucceeds('npm', ['test']), // ground truth
agentCheck({ question: 'Does it match TASK.md?', threshold: 0.85 }), // intent
],
commit: { subject: 'feat: TASK.md' }, // one milestone commit when it converges
});npm i @loops-adk/core # Node >= 20Write a loop in a .loop.ts file, then run it. loops run works from any repo that has the package installed:
loops validate your-feature.loop.ts # offline pre-flight: prints the loop's shape, no model calls
loops run your-feature.loop.ts # run it (live TUI; add --no-tui or --json for headless)The full CLI, the flags-only mode (no file), and the offline demo are in Quick start below.
The primitives compose into something bigger than a single loop: an engineering team that builds a multi-component service, holds it coherent across components, and converges only when each piece clears a bar one agent can't impose on itself: a report-only review battery of distinct lenses, including a genuinely different model.
// Five report-only lenses, each a markdown persona that closes with `<confidence>N%</confidence>`.
// The adversarial lens runs on a DIFFERENT model (codex / GPT-5): any reviewer, any model.
const battery = (name) =>
reviewPanel(name, [
['adversarial', { engine: 'codex' }], // genuinely different priors
['security', { model: 'opus' }],
['correctness', { model: 'sonnet' }],
['conformance', { model: 'opus' }],
['simplicity', { model: 'haiku' }],
]);
const engineer = (name) =>
loop({
name,
body: agentJob({ agent: engineerFor(name), prompt: brief(name), ground: true }),
until: commandSucceeds('node', [`test-${name}.mjs`]), // deterministic truth
review: battery(name), // unanimous; a failing review hands its findings to the next iteration
commit: true,
max: 8,
});
export default dag({
name: 'build-service',
nodes: {
store: engineer('store'),
api: { needs: ['store'], job: engineer('api') },
serialize: { needs: ['store'], isolate: true, job: engineer('serialize') }, // parallel worktree
client: { needs: ['api', 'serialize'], job: engineer('client') },
},
isolation: 'worktree',
});The dag is the manager (toposort + dispatch). Each node is a Converge loop: the engineer builds to its test (until), then the review battery runs in the review slot: five report-only lenses with near-disjoint blind spots, each judging the actual source against the recorded contracts and closing with a <confidence>N%</confidence>. Because a reviewer is just an AgentDef and agentCheck takes an engine and model, any reviewer runs on any model: the adversarial lens on codex (GPT-5) for a true second-model signal, the rest spread across Claude. A failing review is not a dead end: its findings thread into the next iteration as lastReview, so the engineer fixes concrete concerns: the build → review → fix-up loop, with no human in it. isolate runs engineers in parallel worktrees that land back on pass; ground: true carries the contracts only store decides (stable ids, the SSv1| wire tag) to the engineers and reviewers downstream.
A single autonomous agent grades its own homework. This team structurally cannot: "done" means past an independent, multi-lens, multi-model review battery it never applies to itself. That enforced honest-convergence gate is the deepest idea here; memory is one free pillar underneath it. The whole team (engineers and reviewers) is a folder of markdown personas plus the wiring above, runnable in examples/build-service.loop.ts.
Agents rarely nail it in one shot. The reliable pattern is a convergence loop: do a bit of work, check whether you're actually done, and if not, go again. Two things make or break it, and loops is built around both:
- A fresh context every turn. Long-running agents rot as their history balloons.
loopsruns each iteration with a clean slate and lets progress accumulate where it belongs: in the workspace (files, git commits), not in a chat transcript. The loop carries only thin bookkeeping. - Memory in git, not in the transcript. Fresh context alone would mean amnesia. Ledger (below) writes the why to git as the work happens and reads it back before the next turn, so a clean slate is never a blank one.
- A real done-check. "Ask the model if it's finished" is the classic trap: the model grades its own homework.
loopsmakes the gate a first-class value and lets you combine a deterministic signal (the tests genuinely pass) with a separate judge, so "done" means converged, not claims to be.
Everything else (DAGs, nesting, engines, budgets, the TUI) hangs off those ideas. The whole thing is small enough to read in an afternoon.
A loop is easy to start and hard to keep honest. Four parts decide whether it earns its cost, and loops is built around them.
| The hard part | In loops |
|---|---|
| The gate. Knowing the work is actually done, not just that the agent stopped. | A deterministic check (commandSucceeds) and a separate judge (agentCheck) in its own context, hardened with a k-of-n quorum and a geometric-mean rubric so one weak dimension sinks the verdict. The model that did the work never grades it. |
| Memory. Carrying what was learned across a run without dragging a transcript along. | The git commit log is the memory: a structured handoff per milestone, read back before the next turn. No STATE.md the model is trusted to keep tidy, no vector store to sync. |
| Parallelism. Running several agents without collisions on the same files. | isolation: 'worktree' gives each writer its own branch and worktree, landed back on pass with a --no-ff merge. |
| Hard stops. Bounding a loop so it cannot run forever or empty your account. | max caps iterations, budget caps tokens (a non-retryable stop the engine calls refuse to cross), and noProgress stalls out a loop whose iterations reach no new state, with the evidence on the outcome. |
Three things loops does that most loop libraries do not:
- Nesting is a primitive.
loop()anddag()both return aJob, so loops nest inside DAGs and DAGs nest inside loops, to any depth. Orchestrating many loops is one expression, not a separate harness. - Memory survives a squash merge. A squash merge flattens a branch's commit bodies into a list of subject lines and loses the reasoning.
pullRequestJobandmergeJobkeep the squashed commit body a consolidation of the branch. - It runs against any model or tool. The agent launch only touches a one-method
Engine, so the same loop runs on Claude, on a different model, or on your own provider, unchanged.
Two parts are deliberately out of scope. The heartbeat that fires a loop on a schedule belongs in cron, GitHub Actions, or a workflow engine, with a loops job inside. Acting in external tools is the agent's own job through its tools. loops is the body of the loop, kept small.
Status: alpha, the API is still settling. To work on
loopsor run it from a checkout:
git clone https://github.com/jonny981/loops.git
cd loops
npm install
node bin/loops.mjs --help # or: npm link → loops --helpRequires Node ≥ 20. Running from a checkout needs no build step: the CLI runs the TypeScript source directly through tsx.
Flags mode, the standard worker → until → review loop, no code:
loops run \
--prompt "Continue implementing the feature in TASK.md; report what changed." \
--engine claude-cli \
--until "Is the feature fully implemented with passing tests?" --threshold 0.85 \
--review "Does it pass a strict review with no blockers?" \
--max 20Definition-file mode: full power and nesting. A .loop.ts file export defaults a Job:
loops validate examples/confidence-gate.loop.ts # offline pre-flight: load + print the shape, no model calls
loops describe examples/confidence-gate.loop.ts # print the loop's shape (gate, body, nodes) without running
loops describe examples/confidence-gate.loop.ts --json # machine-readable shape for agents
loops run examples/confidence-gate.loop.ts # live Ink TUI
loops run examples/confidence-gate.loop.ts --no-tui # plain streamed logs
loops run examples/confidence-gate.loop.ts --json # NDJSON event stream
loops run <file>imports and executes that file's module, likenode <file>. Only run definition files you trust.
Authoring is agent-native. Both commands work from any repo, including one that consumes loops as a submodule or dependency (the recipe's folder just needs an ES module scope, which such repos already have). loops validate <file> is the cheap, no-model pre-flight an agent runs before loops run: it loads the loop, reports a fix-oriented error if anything is wrong, and prints the loop's shape (its gate, body, and dag nodes), all without spending a single agent turn. loops describe <file> prints that same shape on its own, so an agent can see exactly what it just authored. The authoring guide an agent reads to compose a loop is skills/author-loop/SKILL.md.
The end-to-end agent workflow, from authoring through reading a supervised run's decisions back as structured records rather than a raw event stream:
loops validate feature.loop.ts --json # pre-flight: loads, no spend
loops describe feature.loop.ts --json # the shape, incl. each agent node's contract
loops run feature.loop.ts --no-tui --supervise # run it, registered for observation
loops list # find the runId
loops tail <runId> # follow live events
loops records <runId> --kind revision --path ship/implementation --json # the semantic decision stream, filteredTwo supervision skills go deeper: skills/supervise-loop-run/SKILL.md (monitor a run) and skills/design-agent-team/SKILL.md (compose a specialist team).
Offline demo (no network, no key; uses the mock engine):
npm run example:pollThere is one universal unit of work, and two supporting types:
type Job = (ctx: JobContext) => Promise<Outcome>; // a unit of work, any size
type Condition = (ctx, last) => Promise<{ met; reason; confidence? }>; // a yes/no gate
interface Engine {
run(req, onEvent, signal): Promise<AgentResult>;
} // where an agent turn runsloop()returns aJob, so a loop nests by passing one as another'sbodyorreview.dag()returns aJobtoo, so loops and DAGs nest both ways: a DAG node can be a loop, a loop body can be a DAG.
Nesting is the absence of a special case, not a feature.
loop({
name: 'build-feature',
body, // the Job run each iteration (fresh context); pass a loop()/dag() to nest
start, // gate before iterating; unmet ⇒ aborted
until, // checked after each body; met ⇒ stop (then review)
stopOn, // hard early-exit each iteration; met ⇒ aborted
review, // runs when until is met; non-pass re-enters the loop (folds back as ctx.lastReview)
max, // iteration cap; reached without passing ⇒ exhausted
noProgress, // stall out after n consecutive iterations with no observable progress
maxReviewRestarts, // cap the worker/reviewer standoff independently of max
delayMs, // delay between iterations (polling); interruptible by abort
retry, // { onError: 'continue' | 'fail', maxConsecutive?, backoffMs? }
onIteration,
onComplete, // hooks (onComplete runs once, whatever the outcome)
});With no until, a pass body ends the loop. Terminal status is one of pass · fail · exhausted · aborted · paused (CLI exit codes 0 · 1 · 2 · 130 · 75). paused is a limit-driven, resumable stop. See Rate limits, quotas, and budgets.
start / until / stopOn accept one item or many, freely mixing deterministic predicates and agent judges. Arrays are all by default (wrap in any(...) for or):
until: [
commandSucceeds('npm', ['test']), // deterministic ground truth
agentCheck({ question: 'Good enough to ship?', threshold: 0.9 }), // agent-validated intent
];Prefer this mixed form over a lone judge. A model's self-reported confidence is a weak, poorly-calibrated signal. Treat it as a guard on intent, with a deterministic check as the truth. Two ways to harden the judge itself:
// k-of-n jury: consensus, not one number
quorum(2, judgeA, judgeB, judgeC);
// one judge, multiple dimensions: opens on the GEOMETRIC MEAN,
// so a single weak dimension drags the verdict down
agentCheck({
question: 'Ready to ship?',
threshold: 0.8,
dimensions: ['intent match', 'evidence quality', 'outcome coherence'],
});Builders: predicate, bodyPassed, minConfidence, commandSucceeds (a shell command exits 0), all, any, not, quorum (k-of-n), agentCheck (small-model judge), always, never, and gateJob (lift a condition into a Job, e.g. a reviewer).
The gate detects success; nothing above detects a loop that is failing to converge. max bounds the attempt count and budget bounds the cost, but both fire only after the waste, and neither can tell slow-but-real convergence from the same failure five turns running. noProgress is that sensor: the loop ends exhausted once n consecutive iterations reach no state the run has not already seen.
loop({
name: 'build',
body: agentJob({ prompt: '…', ground: true }),
until: commandSucceeds('npm', ['test']),
max: 50, // generous runway for hard work…
noProgress: 3, // …because the doomed case exits after 3 flat iterations
});Progress means novelty, not change. An iteration counts as progress when any evidence channel reaches something new:
- the workspace fingerprint (HEAD, pending diff, untracked content) is a state this run has never visited, so an agent oscillating A→B→A gets no credit for the return trip;
- the gate confidence beats its previous best by
minConfidenceDelta(default 0.02), a high-water mark, so judge jitter is not progress but slow steady improvement accumulates until it clears the bar; - a custom
signalreturns a value not already seen, the escape hatch for progress the worktree cannot show (a queue length, a passing-test count):noProgress: { window: 3, signal: (ctx) => queueDepth() }.
The default is conservative: one channel showing novelty keeps the loop alive, so real-but-slow work is never cut short. And the exit is a diagnosis, not just a stop: the outcome carries Outcome.stall (the flat iterations, the repeated gate reason, the per-channel evidence) and a loop:stall event fires for supervisors, so "stalled since iteration 5 on the same scope error" replaces "reached max iterations" and a fleet watcher can re-brief the loop instead of shrugging at it. This is also what makes a generous max safe to grant: the safety net and the runway stop being the same number.
Off by default, like commit: a polling loop legitimately makes no progress until the outside world changes. Flags mode: --stall-after <n>. Offline demo: npm run example:stall.
Fresh context kills rot; on its own it would cause amnesia. Ledger is the core that closes the gap: the loop writes its reasoning to git as it works and reads it back before the next turn. No parallel database, no vector store; git is the index: nothing to build, embed, sync, or let go stale (the commit log can't drift out of sync with the code; it is the code's history). (Ledger is the engine; the commit log is the durable memory it reads and writes; .loops/ledger.md and .loops/prompt.md are the live scratch files for work in flight.)
The three tiers below form a progression. The scratch files record what failed and what was tried. The gate turns a fix into a verified fact. The milestone commit distills it into a durable decision. Grounding lets the next turn read that decision instead of re-deriving it.
-
Scratch files: working memory and a handoff. Two gitignored files carry a unit of work forward.
.loops/ledger.mdis working memory for the agent(s) doing the work now: the harness auto-captures each grounded turn (the reasoning + a summary of actions), so the why is recorded even when no single agent holds it all at the end, and fanned-out peers share it..loops/prompt.mdis the handoff the agent distils for whoever continues: intent, alternatives ruled out, constraints, what is left. Grounding injects both into the next context; the commit body is the handoff plus a compacted working log.appendPrompt(ctx.workspace, { heading: 'Why', body: 'tried a token refresh; the gate still failed on scope' });
-
Milestone commits: crystallise it. A commit is a milestone, not an iteration. When a loop converges,
commitJobcomposes one structured body, the handoff plus a compacted working log (the way), welded to the diff (the what), then clears both scratch files. Turn it on withcommit:; iterations stay durable in the workspace + scratch files, so the log holds only converged, reasoned-over checkpoints. Welded to its diff, a commit body is a permanent record any later agent can look back to, as far back as it wants. Finer milestones? Compose finer loops/nodes.loop({ name: 'build', body, until, commit: { subject: 'feat: the feature' } });
-
Grounding: read it back. A fresh turn reads the recent committed commit log (past milestones) and this run's live scratch files (working memory + handoff), prepended to its prompt, so it knows what was already tried. The reach is branch-local: adjacent branches are in-flight and may never land, and the merge is where work becomes shared truth.
agentJob({ label: 'work', prompt: 'Continue the task.', ground: true });
-
Scaling the read: retrieval, then consolidation. Recent-N grounding is the default, but on a long, noisy log the relevant commit falls out of the window.
ground: { retrieve: true }has a cheap model select the relevant commits by subject instead. Use it for long-horizon work. For an indefinite process,consolidateJobfolds the history into a decision-preserving consolidated ledger: a bounded record that keeps every accrued decision verbatim (a naive progress summary loses the specifics), committed as a commit body (the coarse tier, grounded like any milestone, never a side file). Retrieval finds the relevant past commits; consolidation keeps all the decisions in bounded space: different jobs, both in the git grain.agentJob({ label: 'work', prompt: 'Continue.', ground: { retrieve: true } });
-
Ship via PR: survive the squash. The commit log is the memory, but a squash merge collapses a branch's milestone bodies into one commit whose body defaults to a list of subject lines, the reasoning lost from the base branch.
pullRequestJobcloses that: it pushes the branch and opens (or idempotently updates) a PR whose body is the sameconsolidatefold scoped to this branch, kept current as milestones land.mergeJobthen squash-merges with that synthesis as the commit body, gated on CI (auto: truehands the wait to GitHub;when: forgeChecks()is a synchronous gate). The host is the injectableForgeseam (theghCLI by default), so it runs offline against aMockForge.sequence('ship', pullRequestJob({ base: 'main' }), mergeJob({ base: 'main', auto: true }));
The Ledger has two faces: cross-iteration (recover from your own failed attempts in a retry loop) and cross-node (honour an upstream node's decision a downstream agent could not otherwise know). Both need headroom. On one-shot, single-node work memory is only a tax. See docs/concepts.md for where it helps and the measured evidence in bench/RESULTS.md.
The agent launch only ever touches the Engine interface, so the loop knows nothing about your model, provider, or framework.
| name | backend | notes |
|---|---|---|
codex |
codex exec subprocess (execa) |
fresh process per call; read-only unless bypassPermissions |
claude-cli |
claude subprocess (execa) |
fresh process per call; uses host Claude auth, no key |
agent-sdk |
@anthropic-ai/claude-agent-sdk |
fresh query() per call; host Claude auth |
anthropic-api |
@anthropic-ai/sdk |
token-level streaming; cheapest for judges; needs a key |
mock |
scripted, offline | for tests and examples |
Select per-run (--engine, RunOptions.engine) or per-job/condition (engine: takes a name or a ready-made Engine). Bring your own in ~10 lines:
import { run, type Engine } from '@loops-adk/core';
const myEngine: Engine = {
name: 'my-provider',
async run(req, onEvent, signal) {
// call any provider/framework; stream tokens via onEvent({ type: 'text', delta })
return { text, usage: { inputTokens, outputTokens }, model: req.model ?? 'x' };
},
};
await run(job, { engine: 'my-provider', engines: { 'my-provider': myEngine } });That's the whole contract: implement run, register a name. A managed/durable runner could be a drop-in engine too.
Instead of a wall of inline prompt, define each agent as a reusable, job-specific AgentDef: the persona and methodologies live in editable markdown files, the structure and types live in TypeScript. The .ts is the strongly-typed wrapper around the .md:
import { defineAgent, defineSkill, fromFile, agentJob } from '@loops-adk/core';
const tdd = defineSkill({ name: 'tdd', instructions: fromFile(new URL('./skills/tdd.md', import.meta.url)) });
const storeEngineer = defineAgent({
name: 'store-engineer',
system: fromFile(new URL('./agents/store-engineer.md', import.meta.url)), // the persona, as markdown
model: 'sonnet',
tools: ['edit', 'bash'],
tier: 'worker',
capabilities: ['storage engine', 'id stability'],
outputs: [{ name: 'patch' }, { name: 'test-report' }],
requiresSkills: ['contract-first'],
skills: [tdd], // methodologies fold into the system
usesSkills: ['small-diff'],
humanGates: [{ name: 'prod-approval', when: 'deploying production changes' }],
failureModes: [{ mode: 'tests-flaky', recovery: 'isolate the flake, retry once' }],
});
agentJob({ agent: storeEngineer, prompt: 'Build the store to its tests.', ground: true });For a small runnable contract plus feedback example, see
examples/contracted-agent.loop.ts.
agentJob resolves the def into the engine request (system = persona + skills, plus model/tools); inline system/model/tools still override it. A skill is a methodology (how to work: TDD, writing-plans), not a worker. The extra contract fields are optional metadata for validation, loops describe, docs, and future discovery. They do not give an agent dispatch authority. This is what turns a dag into a named team (storeEngineer, apiEngineer, securityReviewer as small files) orchestrated by the DAG and gated by quorum(...).
A gate is only as honest as what it tests. commandSucceeds('npm', ['test']) checks files on disk; to check that the thing works you need it running. The Environment axis is where code runs (local services or a per-branch cloud preview), so until can gate on the live preview, not just static files. It is the third provider axis:
| Axis | Where it… | Lives in |
|---|---|---|
Engine |
the agent thinks | model / provider |
Workspace |
the code lives | worktree + branch |
Environment |
the code runs | local / cloud preview |
Like Engine, loops owns only the interface and the lifecycle binding; the adapter (sst, Vercel, Docker…) is yours and lives next to the deploy config it wraps; loops never depends on a deploy tool. The handle's env (e.g. BASE_URL) is injected into gate commands, so the done-check reaches the live preview.
import { run, loop, commandSucceeds, type Environment } from '@loops-adk/core';
const sstEnv: Environment = {
name: 'sst',
async up(ws) {
const url = await deployStage(slug(ws.branch), ws.dir); // your deploy
return { url, env: { BASE_URL: url }, down: () => removeStage(slug(ws.branch)) };
},
};
const job = loop({ name: 'build', body, until: commandSucceeds('playwright', ['test']) });
await run(job, { environment: sstEnv }); // one env for the run…
// …or DagConfig.environment to give every worktree-team its own stage, named after its branch.Environments are optional: a research pipeline that never deploys just leaves it unset, and the gates test files and commands without a BASE_URL.
Built-in adapters (opt-in subpaths, no added dependency; they shell out to the CLI on PATH):
@loops-adk/core/env/command:commandEnvironment, the generic factory every IaC tool fits (deploy / read outputs / destroy). sst, terraform, pulumi, and cloudformation-via-aws-cli are all thin presets over it.@loops-adk/core/env/sst:sstEnvironment, a per-branch sst stage (sst deploy --stage <branch>).@loops-adk/core/env/docker:dockerEnvironment, a local stack via a per-branch Docker Compose project, with ephemeral-port discovery so parallel branches never collide.
SDK-bound adapters (e.g. the AWS SDK) add a real dependency, so they belong in your own package or loop definition, not the core.
import { dag, sequence, parallel, loop, agentJob, gateJob, agentCheck } from '@loops-adk/core';
dag({
name: 'ship',
concurrency: 2,
nodes: {
research: agentJob({ label: 'research', prompt: '…' }),
implement: { needs: ['research'], job: loop({ /* … a loop as a node */ }) },
test: { needs: ['implement'], job: agentJob({ label: 'test', prompt: '…' }) },
review: { needs: ['test'], job: gateJob('review', agentCheck({ /* … */ })) },
},
});needs = dependencies; a non-pass required dependency blocks its dependents; optional nodes never block or fail the DAG; an unmet when skips a node (counts green); cycles are detected before any work runs. sequence(name, ...jobs) and parallel(name, jobs, concurrency?) are sugar over dag.
Review feedback is a structured revision request. In a loop, a failing review
outcome is threaded into the next body turn as ctx.lastReview; with
consumeFeedback: true, agentJob appends it to the implementation prompt in a
standard block.
const implement = agentJob({
label: 'implementation',
prompt: brief,
consumeFeedback: true,
});For several reviewers, use reviewPanel to aggregate their verdicts into one
outcome. Every reviewer is a gate: the panel passes when all of them clear (or
pass: N of them, k-of-n), and each failing reviewer's concern is surfaced as a
blocking finding threaded into the next pass. An empty panel is a construction
error, not a vacuous pass.
const review = reviewPanel({
// pass: 2, // optional: k-of-n instead of all
reviewers: [
{ name: 'security', review: agentCheck({ question: 'Is it safe?', context: reviewContext({ diff: true, ledger: true }) }) },
{ name: 'correctness', review: agentCheck({ question: 'Is it correct?' }) },
{ name: 'simplicity', review: agentCheck({ question: 'Is it simple?', context: reviewContext({ files: ['src/**'] }) }) },
],
});In a DAG, a targeted revisionRequest({ target, findings }) reruns the target
node and its dependents when maxKickbacks allows it. kickback(to, reason) is
the terse compatibility helper for the same routed feedback. Agents can opt into
a small graph-position prompt block with graphContext: true.
Worktree isolation: branches as teams. A concurrent node can run in its own git worktree on a fork branch (isolation: 'worktree' on the DAG, or isolate: true per node), so parallel writers never collide on files or the index. On pass, its committed work lands back into the line with a --no-ff merge; a conflict fails the node honestly (loops does not auto-resolve; that's a separate layer). Each team gets its own branch, its own scratch files, and (with DagConfig.environment) its own stage, all born and torn down together.
For dynamic dispatch (a loop that discovers each unit at runtime and routes it to its own isolated sub-loop), isolated(job) is the same boundary as a composable wrapper rather than a predeclared node (fork, run, land back on pass):
loop({ name: 'triage', until: queueEmpty, body: pickAndDispatch });
// where pickAndDispatch routes each ticket to isolated(convergeLoop) or isolated(sweep)A loop is not one shape. Three recur, and they differ in what memory does and in what you can even measure: a harness built for one is blind to the others.
| Converge | Sweep | Tend | |
|---|---|---|---|
| shape | one hard target, retried | a known set, one fresh task each | an unbounded process picking the next unit |
| example | build to a high bar with tests | research each OEM | triage issues until none remain |
| iteration N vs N−1 | the same task | an independent task | a discovered task |
| terminates when | the gate passes | the worklist is empty | a dynamic condition (maybe never) |
| memory's job | don't re-walk dead ends | transfer the house style | remember what's done + decided, forever |
loops shape |
loop({ until: gate, max }) |
loop/dag over a worklist |
loop({ until: dynamic, max: ∞ }) |
They nest: GitHub triage is Tend ∘ Converge (pick the next ticket, classify it, dispatch a Converge loop to a test gate); OEM research is Sweep ∘ Converge (each item is itself a multi-step build that must converge). Because a loop and a dag are both Jobs, dispatch is just a body that selects a sub-Job. Wrap it in isolated() when each needs its own worktree. The Ledger's three tiers (scratch files → milestone commits → consolidated ledger) map onto the three nesting levels.
There is no converge() / sweep() / tend() in the API. They are patterns, not primitives. Copy-paste recipes for each (and the nested dispatch) are in docs/patterns.md; the full treatment is in docs/concepts.md.
Four opt-in RunOptions (with matching CLI flags). All default off.
| Option | CLI flag | Effect |
|---|---|---|
budget |
--budget <n> |
Cap total tokens for the run. Engine calls refuse once the cap is hit. |
recordTo |
--record <path> |
Append every structured event as JSONL: a readable, queryable run record. |
checkpoint |
--checkpoint <p> |
Snapshot the shared ctx.state at each loop/dag/job boundary (latest-wins). |
resumeFrom |
--resume <path> |
Restore the ctx.state a prior --checkpoint wrote, so a re-run continues warm. |
await run(job, { budget: 2_000_000, recordTo: '.loops/run.jsonl', checkpoint: '.loops/state.json' });
// later, after a crash or a deliberate stop:
await run(job, { resumeFrom: '.loops/state.json' });budget is the cost guard for a loop that fires a worker plus several judges per iteration: max bounds the call count, budget bounds their cost ({ limit, headroom, soft } for a soft warn-don't-refuse mode).
When a run hits a provider rate limit, an account usage allowance, or its own token budget, the onLimit policy decides what happens. The default, auto, waits when the reset is known and within a cap, otherwise checkpoints and exits with a ready-to-paste resume command.
| Option | CLI flag | Default | Effect |
|---|---|---|---|
onLimit |
--on-limit <policy> |
auto |
auto waits a known reset ≤ maxWaitMs, else pauses · wait always waits a known reset · exit-resume never waits · fail is the old fatal behaviour |
maxWaitMs |
--max-wait <dur> |
300000 (5m) |
Ceiling on a single interruptible limit-wait under auto/wait. |
A wait is interruptible (Ctrl-C unwinds it). When the policy gives up (the reset is unknown, the wait exceeds maxWaitMs, or the policy is exit-resume, and always for a budget, which never refreshes mid-run), the run ends with the terminal status paused (exit code 75, EX_TEMPFAIL, distinct from fail's 1) so a wrapper/cron can tell "paused, resumable" from "failed". With --checkpoint set, the resume command is printed ready to paste; without one, the guidance says to re-run with --checkpoint to make a pause resumable.
The error taxonomy backs this: an engine classifies a throttle into a RATE_LIMIT or QUOTA LoopError carrying the reset hint (retryAfterMs / resetAt) it could read. RATE_LIMIT is retryable; QUOTA is retryable only when a reset is known; BUDGET never is.
- Ink TUI (default on a TTY): a live loop/dag tree, a per-iteration detail panel you can browse while the run continues, and a stats footer. Navigate with
↑/↓(nodes),←/→(iterations),f/space(follow-live),q/Esc/Ctrl-C(abort). --no-tui: streamed line logs, one concise report per completed iteration, e.g.↳ iter 2: body=fail · until=not met · review=fail (needs X) · 1.2k/0.3k tok.--json: NDJSON event stream on stdout.
Every mode ends with a summary: result, per-loop iterations, review tallies, token usage by model, and any errors.
Run with --supervise and the loop registers itself under ~/.loops/runs/, writing its live state there as it goes. Another process reads it with no daemon and no socket, because the filesystem is the channel (the same bet the rest of the library makes).
loops run build.loop.ts --supervise # in one terminal
loops list # in another: every supervised run, with state and iteration
loops status <runId> # its shape plus where it is now: iteration, last gate verdict, tokens
loops tail <runId> # stream its events liveEach run keeps the raw event stream in events.jsonl and a smaller semantic stream in semantic.jsonl with dispatch, completion, surfacing, revision-emitted, and revision-routed records. Use loops records <runId> to inspect those records without knowing the registry path; add --kind revision-routed, --kind revision (both revision kinds), --path ship/implementation, --since <time>, --last <n>, or --json when an agent needs a filtered machine-readable stream. list marks a run dead if its process is gone. The read side is also on the public surface (listRuns, readRunStatus, runEventsPath, runSemanticRecordsPath), so an agent supervising a fleet of loops, killing the ones that drift and kicking work back into the ones that hit a problem, reads the same files. Out-of-process control (pause, abort, and kickback from outside) is the next step.
loops is a fresh-context loop primitive, not a durable workflow engine. The design bet is that the workspace is the state: progress and its reasoning live in git (the Ledger), so each iteration can start clean and still know what came before. If the process dies mid-run, you re-run against the same workspace (the worktree holds the files, the scratch files hold the why, the log holds the milestones) and continue. You lose the bookkeeping, not the work.
It deliberately does not do durable mid-run replay (re-running a half-finished graph and skipping completed steps). That's an orchestration concern; for it, embed a loops job as a step inside Temporal, LangGraph, or Mastra. What it does offer (run records, a thin state checkpoint, a token budget) is the lightweight version that fits the workspace-is-state model.
| You want… | Reach for… |
|---|---|
| Loop an agent to convergence with a real done-gate | loops (you're here) |
| Durable, resumable, replayable workflows | Temporal / LangGraph / Mastra |
| One agent call with tool use | your provider's SDK directly |
- Ledger, git-memory core: the scratch files (working memory + handoff), grounding, milestone commits
- Worktree isolation (branches-as-teams) with
--no-ffland-back - Environment axis: provider interface + offline mock
- Publish to npm (
@loops-adk/core, builtdist+ types, CI release) - Supervision: a file-based run registry with
loops list/status/tail - Out-of-process control:
pause/abort/kickbacka running loop from outside - Optional
wip:autosave tier (per-iteration recovery, squashed on convergence) - No-progress / stall detection (
noProgress): the third hard stop, alongsidemaxandbudget -
cost per accepted changeas a first-class reported metric - Calibration helpers for agent judges
- More engine adapters (OpenAI, local models)
- Scrollable per-iteration transcript in the TUI
npm test # vitest: offline, deterministic via the mock engine
npm run typecheck # tsc --noEmitContributions welcome. Open an issue to discuss anything substantial first. Keep the core small; that smallness is the point.
