Low-latency AI engine for mobile devices & wearables
Production ready toolkit to run AI locally
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
React Native binding of llama.cpp
Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM