Skip to content

Commit 0c13211

Browse files
committed
update model support, move img into media folder
Signed-off-by: Frida Hou <[email protected]> minor fix Signed-off-by: Frida Hou <[email protected]> minor fix Signed-off-by: Frida Hou <[email protected]>
1 parent e5d81ee commit 0c13211

File tree

3 files changed

+72
-15
lines changed

3 files changed

+72
-15
lines changed

docs/source/torch/auto_deploy/auto-deploy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ This project is in active development and is currently in a prototype stage. The
1010
AutoDeploy is a prototype designed to simplify and accelerate the deployment of PyTorch models, including off-the-shelf models like those from HuggingFace transformers library, to TensorRT-LLM.
1111

1212
<div align="center">
13-
<img src="./ad_overview.png" alt="AutoDeploy integration with LLM API" width="70%">
13+
<img src="../../media/ad_overview.png" alt="AutoDeploy integration with LLM API" width="70%">
1414
<p><em>AutoDeploy overview and relation with TensorRT-LLM's LLM api</em></p>
1515
</div>
1616

docs/source/torch/auto_deploy/support_matrix.md

Lines changed: 71 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -9,24 +9,81 @@ The exported graph then undergoes a series of automated transformations, includi
99
**Bring Your Own Model**: AutoDeploy leverages `torch.export` and dynamic graph pattern matching, enabling seamless integration for a wide variety of models without relying on hard-coded architectures.
1010

1111
We support Hugging Face models that are compatible with `AutoModelForCausalLM` and `AutoModelForImageTextToText`.
12-
Additionally, we have officially verified support for the following models:
12+
In addition, we have officially validated the following models using the default configuration: runtime=trtllm, compile_backend=torch-compile, and attn_backend=flashinfer
1313

1414
<details>
1515
<summary>Click to expand supported models list</summary>
1616

17-
| Model Series | HF Model Card | Model Factory | Precision | World Size | Runtime | Compile Backend ||| Attention Backend |||
18-
|--------------|----------------------|----------------|-----------|------------|---------|-----------------|--------------------|--------------------|--------------------|----------|----------|
19-
| | | | | | | torch-simple | torch-compile | torch-opt | triton | flashinfer | MultiHeadLatentAttention |
20-
| LLaMA | meta-llama/Llama-2-7b-chat-hf<br>meta-llama/Meta-Llama-3.1-8B-Instruct<br>meta-llama/Llama-3.1-70B-Instruct<br>codellama/CodeLlama-13b-Instruct-hf | AutoModelForCausalLM | BF16 | 1,2,4 | demollm, trtllm |||||| n/a |
21-
| LLaMA-4 | meta-llama/Llama-4-Scout-17B-16E-Instruct<br>meta-llama/Llama-4-Maverick-17B-128E-Instruct | AutoModelForImageTextToText | BF16 | 1,2,4,8 | demollm, trtllm |||||| n/a |
22-
| Nvidia Minitron | nvidia/Llama-3_1-Nemotron-51B-Instruct<br>nvidia/Llama-3.1-Minitron-4B-Width-Base<br>nvidia/Llama-3.1-Minitron-4B-Depth-Base | AutoModelForCausalLM | BF16 | 1,2,4 | demollm, trtllm |||||| n/a |
23-
| Nvidia Model Optimizer | nvidia/Llama-3.1-8B-Instruct-FP8<br>nvidia/Llama-3.1-405B-Instruct-FP8 | AutoModelForCausalLM | FP8 | 1,2,4 | demollm, trtllm |||||| n/a |
24-
| DeepSeek | deepseek-ai/DeepSeek-R1-Distill-Llama-70B | AutoModelForCausalLM | BF16 | 1,2,4 | demollm, trtllm |||||| n/a |
25-
| Mistral | mistralai/Mixtral-8x7B-Instruct-v0.1<br>mistralai/Mistral-7B-Instruct-v0.3 | AutoModelForCausalLM | BF16 | 1,2,4 | demollm, trtllm |||||| n/a |
26-
| BigCode | bigcode/starcoder2-15b | AutoModelForCausalLM | FP32 | 1,2,4 | demollm, trtllm |||||| n/a |
27-
| Deepseek-V3 | deepseek-ai/DeepSeek-V3 | AutoModelForCausalLM | BF16 | 1,2,4 | demollm |||| n/a | n/a ||
28-
| Phi4 | microsoft/phi-4<br>microsoft/Phi-4-reasoning<br>microsoft/Phi-4-reasoning-plus | AutoModelForCausalLM | BF16 | 1,2,4 | demollm, trtllm |||||| n/a |
29-
| Phi3/2 | microsoft/Phi-3-mini-4k-instruct<br>microsoft/Phi-3-mini-128k-instruct<br>microsoft/Phi-3-medium-4k-instruct<br>microsoft/Phi-3-medium-128k-instruct<br>microsoft/Phi-3.5-mini-instruct | AutoModelForCausalLM | BF16 | 1,2,4 | demollm, trtllm ||| ✅(partly) ||| n/a |
17+
- Qwen/QwQ-32B
18+
- Qwen/Qwen2.5-0.5B-Instruct
19+
- Qwen/Qwen2.5-1.5B-Instruct
20+
- Qwen/Qwen2.5-3B-Instruct
21+
- Qwen/Qwen2.5-7B-Instruct
22+
- Qwen/Qwen3-0.6B
23+
- Qwen/Qwen3-235B-A22B
24+
- Qwen/Qwen3-30B-A3B
25+
- Qwen/Qwen3-4B
26+
- Qwen/Qwen3-8B
27+
- TinyLlama/TinyLlama-1.1B-Chat-v1.0
28+
- apple/OpenELM-1_1B-Instruct
29+
- apple/OpenELM-270M-Instruct
30+
- apple/OpenELM-3B-Instruct
31+
- apple/OpenELM-450M-Instruct
32+
- bigcode/starcoder2-15b-instruct-v0.1
33+
- bigcode/starcoder2-7b
34+
- deepseek-ai/DeepSeek-Prover-V1.5-SFT
35+
- deepseek-ai/DeepSeek-Prover-V2-7B
36+
- deepseek-ai/DeepSeek-R1-Distill-Llama-70B
37+
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
38+
- deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
39+
- google/codegemma-7b-it
40+
- google/gemma-1.1-7b-it
41+
- google/gemma-2-27b-it
42+
- google/gemma-2-2b-it
43+
- google/gemma-2-9b-it
44+
- google/gemma-2b
45+
- google/gemma-3-1b-it
46+
- ibm-granite/granite-3.1-2b-instruct
47+
- ibm-granite/granite-3.1-8b-instruct
48+
- ibm-granite/granite-3.3-2b-instruct
49+
- ibm-granite/granite-3.3-8b-instruct
50+
- ibm-granite/granite-guardian-3.1-2b
51+
- ibm-granite/granite-guardian-3.2-5b
52+
- meta-llama/CodeLlama-34b-Instruct-hf
53+
- meta-llama/CodeLlama-7b-Instruct-hf
54+
- meta-llama/CodeLlama-7b-Python-hf
55+
- meta-llama/Llama-2-13b-chat-hf
56+
- meta-llama/Llama-2-7b-chat-hf
57+
- meta-llama/Llama-3.1-8B-Instruct
58+
- meta-llama/Llama-3.2-1B-Instruct
59+
- meta-llama/Llama-3.2-3B-Instruct
60+
- meta-llama/Llama-3.3-70B-Instruct
61+
- meta-llama/Llama-4-Maverick-17B-128E-Instruct
62+
- meta-llama/Llama-4-Scout-17B-16E-Instruct
63+
- microsoft/Phi-3-medium-128k-instruct
64+
- microsoft/Phi-3-medium-4k-instruct
65+
- microsoft/Phi-4-mini-instruct
66+
- microsoft/Phi-4-mini-reasoning
67+
- microsoft/Phi-4-reasoning
68+
- microsoft/Phi-4-reasoning-plus
69+
- microsoft/phi-4
70+
- mistralai/Codestral-22B-v0.1
71+
- mistralai/Mistral-7B-Instruct-v0.2
72+
- mistralai/Mistral-7B-Instruct-v0.3
73+
- mistralai/Mixtral-8x22B-Instruct-v0.1
74+
- nvidia/Llama-3.1-405B-Instruct-FP8
75+
- nvidia/Llama-3.1-70B-Instruct-FP8
76+
- nvidia/Llama-3.1-8B-Instruct-FP8
77+
- nvidia/Llama-3.1-Minitron-4B-Depth-Base
78+
- nvidia/Llama-3.1-Minitron-4B-Width-Base
79+
- nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
80+
- nvidia/Llama-3.1-Nemotron-Nano-8B-v1
81+
- nvidia/Llama-3_1-Nemotron-51B-Instruct
82+
- nvidia/Llama-3_1-Nemotron-Ultra-253B-v1
83+
- nvidia/Llama-3_1-Nemotron-Ultra-253B-v1-FP8
84+
- nvidia/Llama-3_3-Nemotron-Super-49B-v1
85+
- nvidia/Mistral-NeMo-Minitron-8B-Base
86+
- perplexity-ai/r1-1776-distill-llama-70b
3087

3188
</details>
3289

0 commit comments

Comments
 (0)