⚠

Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.

C++jmaczan/tiny-vllm

tiny-vllm

Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM

60.5/100

★ 946Forks: 67

View on GitHub →

Loading report...

Similar Projects

uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++★ 1.5K

cactus

Quantization, kernels, inference engine for mobiles, wearables, smart home and robots.

C++★ 5.5K

ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++★ 905

LLM-Hub

Local AI Assistant on your phone

C++★ 512

← Back to List