⚠

Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.

Pythonnatolambert/rlhf-book

rlhf-book

Textbook on reinforcement learning from human feedback

80.9/100

★ 1.9KForks: 183

View on GitHub →Homepage →

Loading report...

Similar Projects

distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Python★ 3.2K

AI-Compass

“AI-Compass”将为社区指引在 AI 技术海洋中航行的方向，无论你是初学者还是进阶开发者，都能在这里找到通往 AI 各大方向的路径。旨在帮助开发者系统性地了解 AI 的核心概念、主流技术、前沿趋势，并通过实践掌握从理论到落地的全过程。

Python★ 686

oat

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

Python★ 650

OpenJudge

OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards

Python★ 570

← Back to List