⚠

Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.

Pythonmicrosoft/sarathi-serve

sarathi-serve

A low-latency & high-throughput serving engine for LLMs

49.0/100

★ 500Forks: 63

View on GitHub →

Loading report...

Similar Projects

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python★ 80.2K

lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Python★ 3.8K

transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python★ 160.7K

sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python★ 27.9K

← Back to List