Frontier Models playing the board game Diplomacy.
An agent benchmark with tasks in a simulated software company.
τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TPUv6e/v7/Trainium2/3
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.