Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
Pythonsierra-research/tau2-bench

tau2-bench

τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

78.1/100
1.3KForks: 334
View on GitHubHomepage →
Loading report...

Similar Projects

InferenceX

70

Open Source Continuous Inference Benchmark Research Platform Kimi K2.6, DeepSeekv4, GLM5 - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 & soon™ TPUv6e/v7/Trainium2/3

Python1.1K

AI_Diplomacy

64

Frontier Models playing the board game Diplomacy.

Python669

meta-agents-research-environments

71

Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike static benchmarks, this platform introduces evolving environments where agents must adapt their strategies as new information becomes available, mirroring real-world challenges.

Python513

hermes-agent

90

The agent that grows with you

Python188.9K
Back to List