Skip to content

feat: add log-watcher workflow for agent run diagnostics#327

Open
adamhenson wants to merge 2 commits intogithubnext:mainfrom
adamhenson:add-log-watcher
Open

feat: add log-watcher workflow for agent run diagnostics#327
adamhenson wants to merge 2 commits intogithubnext:mainfrom
adamhenson:add-log-watcher

Conversation

@adamhenson
Copy link
Copy Markdown
Contributor

Summary

  • Add workflows/log-watcher.md - fires on workflow_run: completed, downloads the agent-artifacts artifact written by gh-aw's firewall, scans run logs for error patterns and retry loops, analyses token-usage.jsonl for anomalies, and posts a health diagnosis on the associated pull request or creates a diagnosis issue
  • Add docs/log-watcher.md - installation instructions, mermaid flow diagram, health level reference, and detection list
  • Update README.md - add Log Watcher to the Fault Analysis Workflows section

Related to #297. Companion to #319.

Notes

  • Silent on non-agent runs: if no agent-artifacts artifact exists the workflow produces no output
  • Three health levels: Healthy runs get a brief collapsed summary; Degraded and Failed runs get a full diagnosis with log excerpts and token metric details
  • Detects error/exception/fatal messages, timeouts, rate limits (429), retry loops (same tool called >5 times), and context window truncation warnings
  • Token anomalies flagged: high output ratio, low cache efficiency, total token spikes, unexpected model mixing
  • Optional high-cost failure alert when a failed run exceeds 50,000 tokens
  • Token data from token-usage.jsonl written by gh-aw's firewall - no extra setup needed beyond enabling the firewall (the default)

@dsyme
Copy link
Copy Markdown
Contributor

dsyme commented May 8, 2026

@adamhenson @lpcox @pelikhan Looks like we also added https://github.com/githubnext/agentics/blob/main/docs/cost-tracker.md - probably inspired by your issue Adam

Could you three reconcile these please?

Great suggestion either way. Slightly concerned it may be expensive if triggering often

@pelikhan
Copy link
Copy Markdown
Contributor

pelikhan commented May 8, 2026

@adamhenson maybe we can link to a report in your org as a community AW?

@adamhenson
Copy link
Copy Markdown
Contributor Author

Good call on reconciling them. The intended split: cost-tracker answers "what did this run cost" and log-watcher answers "what went wrong." They share the same data source but the output is different - a spend summary vs. a health diagnosis. Happy to clarify that in the docs for both, or to consolidate if you'd prefer a single workflow that does both.

The cost concern is fair. Log-watcher only produces verbose output on degraded or failed runs - healthy runs get a one-liner. That keeps the token footprint low for normal operation. Worth calling out explicitly in the docs either way.

@adamhenson
Copy link
Copy Markdown
Contributor Author

@pelikhan that would be great - happy to contribute whatever format works. Are you thinking a link in the README, a dedicated docs page, or something else? Let me know what you have in mind and I'll put it together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants