LLMs as Copilots for Theorem Proving in Lean
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI.
High-speed Large Language Model Serving for Local Deployment
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.