⚠

Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.

Pythonpredibase/lorax

lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

85.7/100

★ 3.8KForks: 312

View on GitHub →Homepage →

Loading report...

Similar Projects

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python★ 80.1K

OpenLLM

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

Python★ 12.3K

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python★ 21.1K

BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Python★ 8.6K

← Back to List