[ICLR 2026]QeRL enables RL for 32B LLMs on a single H100 GPU.
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen3.5, GPT-OSS, Llama, and more!
Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.
Awesome Reasoning LLM Tutorial/Survey/Guide
AgentFlow: In-the-Flow Agentic System Optimization