We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
A high-throughput and memory-efficient inference and serving engine for LLMs
Python 63.8k 11.5k
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Python 2.3k 292
Common recipes to run vLLM
Jupyter Notebook 240 85
Community maintained hardware plugin for vLLM on Ascend
TPU inference for vLLM, with unified JAX and PyTorch support.
There was an error while loading. Please reload this page.
Community maintained hardware plugin for vLLM on Intel Gaudi
The vLLM XPU kernels for Intel GPU
Cost-efficient and pluggable Infrastructure components for GenAI inference
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
Intelligent Router for Mixture-of-Models
Community maintained hardware plugin for vLLM on Spyre