Distribute and run LLMs with a single file.
Talk to your Mac, query your docs, no cloud required. On-device voice AI + RAG
Fast LLM speculative inference server for consumer hardware.
Port of OpenAI's Whisper model in C/C++
High-speed Large Language Model Serving for Local Deployment