Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
PythonLeanModels/DFloat11

DFloat11

DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference

47.2/100
608Forks: 37
View on GitHub
Loading report...

Similar Projects

ipex-llm

73

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.

Python8.7K

chitu

84

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python3.4K

headroom

79

The Context Optimization Layer for LLM Applications

Python699

langchain

94

The agent engineering platform

Python128.7K
Back to List