⚠

Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.

C++cactus-compute/cactus

cactus

Quantization, kernels, inference engine for mobiles, wearables, smart home and robots.

86.1/100

★ 5.5KForks: 450

View on GitHub →Homepage →

Loading report...

Similar Projects

runanywhere-sdks

Production ready toolkit to run AI locally

C++★ 10.3K

distributed-llama

Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.

C++★ 3.0K

llama.rn

React Native binding of llama.cpp

C++★ 1.0K

tiny-vllm

Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM

C++★ 946

← Back to List