Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
PythonVibeBench/VibeSearchBench
VibeSearchBench
🔍 The hardest search benchmark in the wild — vague, multi-turn, proactive. 200 long-horizon tasks with persona-driven progressive disclosure, scored by verifiable schema-free knowledge-graph evaluation. No vibes, just triplet F1.