Textbook on reinforcement learning from human feedback
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
“AI-Compass”将为社区指引在 AI 技术海洋中航行的方向,无论你是初学者还是进阶开发者,都能在这里找到通往 AI 各大方向的路径。旨在帮助开发者系统性地了解 AI 的核心概念、主流技术、前沿趋势,并通过实践掌握从理论到落地的全过程。
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards