A command-line interface tool for serving LLM using vLLM.
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
LMCache: Supercharge Your LLM with the Fastest KV Cache Layer