⚠

Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.

C++vectorch-ai/ScaleLLM

ScaleLLM

A high-performance inference system for large language models, designed for production environments.

60.4/100

★ 500Forks: 41

View on GitHub →Homepage →

Loading report...

Similar Projects

cactus

Low-latency AI engine for mobile devices & wearables

C++★ 5.3K

vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

C++★ 2.2K

ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++★ 905

tiny-vllm

Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM

C++★ 797

← Back to List