Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
Pythonintel/neural-compressor

neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

89.9/100
2.6KForks: 303
View on GitHubHomepage →
Loading report...

Similar Projects

nncf

81

Neural Network Compression Framework for enhanced OpenVINO™ inference

Python1.1K

LightCompress

68

[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.

Python706

Chinese-LLaMA-Alpaca

90

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

Python18.9K

OmniQuant

51

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python891
Back to List