Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
C++Tiiny-AI/PowerInfer

PowerInfer

High-speed Large Language Model Serving for Local Deployment

65.1/100
9.5KForks: 578
View on GitHub
Loading report...

Similar Projects

lemonade

85

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

C++4.3K

ZhiLight

58

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++905

deeplake

87

Deeplake is AI Data Runtime for Agents. It provides serverless postgres with a multimodal datalake, enabling scalable retrieval and training.

C++9.2K

cactus

86

Low-latency AI engine for mobile devices & wearables

C++5.3K
Back to List