Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
C++jmaczan/tiny-vllm

tiny-vllm

Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM

51.6/100
776Forks: 47
View on GitHub
Loading report...

Similar Projects

uccl

80

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++1.4K

cactus

86

Low-latency AI engine for mobile devices & wearables

C++5.3K

ZhiLight

58

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++905

ScaleLLM

61

A high-performance inference system for large language models, designed for production environments.

C++500
Back to List