Fully automatic censorship removal for language models
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a high-performance serving framework for large language models and multimodal models.
Nano vLLM
Community maintained hardware plugin for vLLM on Ascend