⚠

Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.

PythonTHUDM/AgentBench

AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

51.0/100

★ 3.6KForks: 272

View on GitHub →

Loading report...

Similar Projects

langroid

Harness LLMs with Multi-Agent Programming

Python★ 4.1K

gptme

Your agent in your terminal, equipped with local tools: writes code, uses the terminal, browses the web. Make your own persistent autonomous agent on top!

Python★ 4.4K

AutoPR

AutoPR autonomously wrote pull requests in response to issues

Python★ 1.4K

vim-ai

AI-powered code assistant for Vim. OpenAI and ChatGPT plugin for Vim and Neovim.

Python★ 1.2K

← Back to List