Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation (NeurIPS 2025)
Supercharge Your LLM with the Fastest KV Cache Layer
LLM KV cache compression made easy
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
The agent engineering platform