⚠

Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.

Python0xSero/turboquant

turboquant

TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration

45.3/100

★ 1.4KForks: 171

View on GitHub →

Loading report...

Similar Projects

AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Python★ 184.3K

transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python★ 160.6K

hermes-agent

The agent that grows with you

Python★ 152.1K

langflow

Langflow is a powerful tool for building and deploying AI-powered agents and workflows.

Python★ 148.1K

← Back to List