Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
Pythonintel/neural-compressor

neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

89.8/100
2.6KForks: 295
View on GitHubHomepage →
Loading report...

Similar Projects

nncf

81

Neural Network Compression Framework for enhanced OpenVINO™ inference

Python1.1K

LightCompress

57

[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.

Python684

llm-compressor

85

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python2.8K

OmniQuant

53

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python890
Back to List