Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
Pythonvllm-project/vllm

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

92.6/100
82.4KForks: 17.9K
View on GitHubHomepage →
Loading report...

Similar Projects

lorax

82

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Python3.8K

sglang

91

SGLang is a high-performance serving framework for large language models and multimodal models.

Python28.9K

LMCache

88

LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

Python8.5K

nano-vllm

59

Nano vLLM

Python14.0K
Back to List