⚠

Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.

Pythonmicrosoft/MInference

MInference

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

57.7/100

★ 1.2KForks: 80

View on GitHub →Homepage →

Loading report...

Similar Projects

hermes-agent

The agent that grows with you

Python★ 220.0K

AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Python★ 185.7K

markitdown

Python tool for converting files and office documents to Markdown.

Python★ 168.8K

skills

Public repository for Agent Skills

Python★ 164.0K

← Back to List