Skip to content

Popular repositories Loading

  1. skillsbench skillsbench Public

    SkillsBench evaluates how well skills work and how effective agents are at using them

    PDDL 944 242

  2. benchflow benchflow Public

    AI benchmark runtime framework that allows you to integrate and evaluate AI tasks using Docker-based benchmarks.

    Python 198 15

  3. pokemon-gym pokemon-gym Public

    Python 92 8

  4. ClawsBench ClawsBench Public

    Repository for results and data (coming soon!) for ClawsBench

    8

  5. jfkarena jfkarena Public

    TypeScript 7

  6. llm-builds-linux llm-builds-linux Public

    Python 6 1

Repositories

Showing 10 of 15 repositories
  • benchflow Public

    AI benchmark runtime framework that allows you to integrate and evaluate AI tasks using Docker-based benchmarks.

    benchflow-ai/benchflow’s past year of commit activity
    Python 198 Apache-2.0 15 0 0 Updated Apr 10, 2026
  • ClawsBench Public

    Repository for results and data (coming soon!) for ClawsBench

    benchflow-ai/ClawsBench’s past year of commit activity
    8 0 0 0 Updated Apr 8, 2026
  • skillsbench Public

    SkillsBench evaluates how well skills work and how effective agents are at using them

    benchflow-ai/skillsbench’s past year of commit activity
    PDDL 944 Apache-2.0 242 5 213 Updated Mar 27, 2026
  • harbor Public Forked from harbor-framework/harbor

    Harbor is a framework for running agent evaluations and creating and using RL environments.

    benchflow-ai/harbor’s past year of commit activity
    Python 2 Apache-2.0 890 0 0 Updated Mar 25, 2026
  • agent-client-protocol Public Forked from agentclientprotocol/agent-client-protocol

    A protocol for connecting any editor to any agent

    benchflow-ai/agent-client-protocol’s past year of commit activity
    Rust 0 Apache-2.0 215 0 0 Updated Mar 15, 2026
  • cli Public Forked from googleworkspace/cli

    Google Workspace CLI — one command-line tool for Drive, Gmail, Calendar, Sheets, Docs, Chat, Admin, and more. Dynamically built from Google Discovery Service. Includes AI agent skills.

    benchflow-ai/cli’s past year of commit activity
    Rust 0 Apache-2.0 1,232 0 0 Updated Mar 15, 2026
  • gepa Public Forked from gepa-ai/gepa

    Optimize prompts, code, and more with AI-powered Reflective Text Evolution

    benchflow-ai/gepa’s past year of commit activity
    Jupyter Notebook 0 MIT 291 0 0 Updated Mar 3, 2026
  • terminal-bench-3 Public Forked from harbor-framework/terminal-bench-3

    🚧 Accepting Task Submissions 🚧

    benchflow-ai/terminal-bench-3’s past year of commit activity
    Python 0 121 0 0 Updated Feb 17, 2026
  • benchflow-ai/harbor-datasets’s past year of commit activity
    Python 0 95 0 0 Updated Feb 12, 2026
  • benchflow-ai/skillsbench-trajectories’s past year of commit activity
    Python 4 1 0 0 Updated Feb 11, 2026

Top languages

Loading…

Most used topics

Loading…