⚠

Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.

Pythonintel/neural-compressor

neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

89.9/100

★ 2.6KForks: 303

View on GitHub →Homepage →

Loading report...

Similar Projects

nncf

Neural Network Compression Framework for enhanced OpenVINO™ inference

Python★ 1.1K

LightCompress

[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.

Python★ 706

Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

Python★ 18.9K

OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python★ 891

← Back to List