Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
Pythonnatolambert/rlhf-book

rlhf-book

Textbook on reinforcement learning from human feedback

80.9/100
1.9KForks: 183
View on GitHubHomepage →
Loading report...

Similar Projects

distilabel

86

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Python3.2K

AI-Compass

66

“AI-Compass”将为社区指引在 AI 技术海洋中航行的方向,无论你是初学者还是进阶开发者,都能在这里找到通往 AI 各大方向的路径。旨在帮助开发者系统性地了解 AI 的核心概念、主流技术、前沿趋势,并通过实践掌握从理论到落地的全过程。

Python686

oat

65

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

Python650

OpenJudge

75

OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards

Python570
Back to List