Progressive, provider-neutral testing plan runner for AI agents.
Agent Check exists for the testing gap that appears when AI agents are rapidly creating or changing applications.
Most traditional testing tools are built for repeatable retesting. They work well after a human or automation engineer has already encoded stable selectors, routes, assertions, and flows. That is valuable, but it is often too rigid for approval-style testing of freshly generated code, where the app may have just been created and exact details like selectors, accessibility labels, terminal states, or window controls may still be unknown.
Pure natural-language computer-use tools sit at the other end of the spectrum. They can explore from vague instructions, but they usually need an LLM call and fresh visual/context input at every step. That becomes slow and expensive very quickly, especially when many agents or many features are being tested in parallel.
Agent Check is a hybrid between those models. An agent can write a testing plan at the level it currently knows:
- high-level
semantic,task,intent, orvisualcandidates when the app is new structuralcandidates when accessible roles, labels, text, or terminal regions are knownexactcandidates when stable ids, selectors, commands, or automation ids are known
At runtime, the runner uses providers to execute deterministic parts directly and uses LLM inference only where semantic interpretation is needed. When a high-level candidate is resolved into a lower-level target, the run can produce refinements so future plans become cheaper, faster, and more deterministic.
The goal is to balance AI flexibility with testing-framework determinism: agents can test immediately after generating code without needing every exact detail up front, and they can progressively harden those tests as the application stabilizes.
Agent Check accepts a YAML Testing Plan, validates it, executes it through runtime providers, collects evidence, classifies failures, and writes run artifacts. It does not author plans. Plans can be written manually, by an agent, or by any external system.
A plan describes user-visible behavior:
- what app surface to test, such as
web,tui,desktop, orelectron - what the user is trying to do
- ordered flow steps
- exact, structural, semantic, visual, and provider-hint candidates
- assertions and failure policy
Concrete engines stay in providers. A testing plan should not mention Playwright, Appium, OpenAI, Selenium, or other implementation engines.
From npm:
npm install -g @codebolt/agent-checkFrom this repository:
npm install
npm run build
node dist\cli\index.js doctorValidate a plan:
agent-check validate examples\web-headed-exact-candidates.plan.yamlList available providers:
agent-check providers --config agent-check.config.yamlRun a headed Chrome web plan without LLM:
agent-check run examples\web-headed-exact-candidates.plan.yaml --config agent-check.config.yaml --no-llm --run-id web-exactRun a headed Chrome web plan with LLM semantic resolution:
agent-check run examples\web-headed-semantic-llm.plan.yaml --config agent-check.config.yaml --model zhipu/glm-4.7 --run-id web-semanticArtifacts are written to:
.agent-check/runs/<runId>/
Important files in every run:
result.json: final pass/fail status, failed step, failure class, artifactstrace.json: provider execution tracetrace.jsonl: lightweight per-step status logllm-trace.json: LLM candidate resolution, assertion judgement, and failure classification events- screenshots/snapshots returned by providers
Runtime choices live outside the plan in agent-check.config.yaml.
artifactStore: .agent-check
execution:
maxRecoveryAttemptsPerStep: 10
llm:
enabled: true
model: codex-cli/gpt-5.5
providers:
mock:
enabled: true
webPlaywright:
enabled: true
tuiProcess:
enabled: true
electronPlaywright:
enabled: true
windowsDesktop:
enabled: trueModel precedence is:
--model > AI_MODEL from .env/environment > agent-check.config.yaml
The CLI loads .env automatically. Do not commit secrets. For LLM backends,
model prefixes, environment variables, and llm-trace.json, see
docs/LLM.md.
| Provider id | Surface | Notes |
|---|---|---|
mock |
mock |
Deterministic provider for runner tests and examples. |
web-playwright |
web |
Browser provider for websites. Supports headed/headless mode and browserChannel: chrome. |
tui-process |
tui |
Spawns a command, sends text/keys, reads terminal output. |
electron-playwright |
electron |
Launches Electron through Playwright and acts on renderer controls. |
windows-desktop |
desktop |
Windows UI Automation provider for basic native controls. |
Mobile candidate types exist in the schema, but a mobile provider is not implemented yet.
Provider details and custom-provider guidance:
Web, headed Chrome:
agent-check run examples\web-headed-exact-candidates.plan.yaml --config agent-check.config.yaml --no-llm
agent-check run examples\web-headed-structural-candidates.plan.yaml --config agent-check.config.yaml --no-llm
agent-check run examples\web-headed-provider-hint-candidates.plan.yaml --config agent-check.config.yaml --no-llm
agent-check run examples\web-headed-semantic-llm.plan.yaml --config agent-check.config.yaml --model zhipu/glm-4.7Mock:
agent-check run examples\mock-pass.plan.yaml --no-llm
agent-check run examples\mock-fail.plan.yaml --no-llmTUI:
agent-check run examples\tui-local.plan.yaml --config agent-check.config.yaml --no-llmDetailed guides:
The short version:
specVersion: agent-check/v1
kind: TestingPlan
metadata:
id: web-smoke
title: User can create a project
target:
appRef: local-web-fixture
surface: web
baseUrl: file:///D:/agentictest/examples/web-fixture.html
headless: false
browserChannel: chrome
intent:
summary: Verify a user can create a project.
acceptance:
- The form opens.
- The project name can be entered.
- The created project message appears.
flow:
- id: fill_project_name
goal: Fill the project name field
operation:
type: input
value: Example Project
candidates:
- semantic:
instruction: Find the project name field.
success:
any:
- semantic:
intent: The project name was entered.npm install
npm run build
npm test
npm run checkPackage preview:
npm pack --dry-run --json- Use
--no-llmfor deterministic exact/structural/providerHint examples. - Use
llm-trace.jsonto verify that semantic plans actually called the LLM. See docs/LLM.md. - Provider errors, missing app sessions, browser launch failures, and LLM account failures should classify as
environment.