Popular repositories Loading
-
skillsbench
skillsbench PublicSkillsBench evaluates how well skills work and how effective agents are at using them.
-
awesome-evals
awesome-evals PublicA curated, non-BS library of the best resources for building and evaluating AI agents — papers, blogs, talks, tools, benchmarks. Maintained by BenchFlow.
-
-
Repositories
- awesome-evals Public
A curated, non-BS library of the best resources for building and evaluating AI agents — papers, blogs, talks, tools, benchmarks. Maintained by BenchFlow.
benchflow-ai/awesome-evals’s past year of commit activity - skillsbench Public
SkillsBench evaluates how well skills work and how effective agents are at using them.
benchflow-ai/skillsbench’s past year of commit activity - env0 Public
benchflow-ai/env0’s past year of commit activity - skillsbench-leaderboard Public
benchflow-ai/skillsbench-leaderboard’s past year of commit activity - mini-swe-agent Public Forked from SWE-agent/mini-swe-agent
The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >74% on SWE-bench verified!
benchflow-ai/mini-swe-agent’s past year of commit activity - skillsbench-trajectories Public
benchflow-ai/skillsbench-trajectories’s past year of commit activity
Top languages
Loading…
Most used topics
Loading…