Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
Pythonlasgroup/SDPO

SDPO

Reinforcement Learning via Self-Distillation (SDPO)

49.9/100
937Forks: 105
View on GitHubHomepage →
Loading report...

Similar Projects

TTRL

63

[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning

Python1.1K

PageIndex

80

📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG

Python32.8K

AReaL

88

The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.

Python5.3K

EasyR1

69

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python5.0K
Back to List