Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
PythonServerlessLLM/ServerlessLLM

ServerlessLLM

Serverless LLM Serving for Everyone.

79.4/100
662Forks: 66
View on GitHubHomepage →
Loading report...

Similar Projects

vllm

93

A high-throughput and memory-efficient inference and serving engine for LLMs

Python72.4K

ml-engineering

74

Machine Learning Engineering Open Book

Python17.3K

TensorRT-LLM

89

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

Python13.0K

LMCache

87

Supercharge Your LLM with the Fastest KV Cache Layer

Python7.6K
Back to List