Large Language Model Text Generation Inference
Nano vLLM
A high-throughput and memory-efficient inference and serving engine for LLMs
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Faster Whisper transcription with CTranslate2