Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
Pythonnatolambert/rlhf-book

rlhf-book

Textbook on reinforcement learning from human feedback

82.3/100
2.0KForks: 204
View on GitHubHomepage →
Loading report...

Similar Projects

distilabel

85

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Python3.2K

AI-Compass

67

“AI-Compass”将为社区指引在 AI 技术海洋中航行的方向,无论你是初学者还是进阶开发者,都能在这里找到通往 AI 各大方向的路径。旨在帮助开发者系统性地了解 AI 的核心概念、主流技术、前沿趋势,并通过实践掌握从理论到落地的全过程。

Python783

oat

62

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

Python660

OpenJudge

76

OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards

Python654
Back to List