[RFC]: Add Multimodal Model Recipes (Qwen2.5-VL, Qwen2.5-Omni, InternVL, etc)

**Motivation**
This RFC proposes to add recipes for multimodal models already supported, such as Qwen2.5-VL, InternVL3, etc, and models possibly planned to support in the future, like qwen2.5-omni-talker [#16347](https://github.com/vllm-project/vllm/pull/16347), VILA [#11887](https://github.com/vllm-project/vllm/issues/11887).

Compared with pure LLMs, different multimodal models have various processing pipelines for multimodal inputs such as images, videos and audios. Therefore, it is an urgent need to clarify input format, usage and corresponding performance for each model in distinct tasks.

Besides, as the RFC [#4194](https://github.com/vllm-project/vllm/issues/4194) outlines roadmap of supporting multi-modality along with V1 refactor, a large number of features have been finished while rest are ongoing. It is also very significant to provide latest evaluation and performance in general benchmarks with updated architecture for each multimodal model.

**Proposed points for recipe**
1. Hyperparameters for different tasks
2. Input/output processing methods and examples
3. Evaluation results on typical benchmarks
4. Performance data on certain Hardware architectures

**Proposed models to add recipes (Included but not limited to):**

- [x] Qwen2.5-VL #30 
- [ ] InternVL3 #35 
- [ ] Skywork R1V
- [ ] Granite-speech-3.3-8b
- [x] Llama 4 #13 
- [ ] Gemma3
- [x] GLM-4.5 #23 
- [ ] Qwen2.5-Omni (Talker not supported in vLLM yet)
- [ ] VILA (Not supported in vLLM yet)
- [ ] BAGEL (Not supported in vLLM yet)




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC]: Add Multimodal Model Recipes (Qwen2.5-VL, Qwen2.5-Omni, InternVL, etc) #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC]: Add Multimodal Model Recipes (Qwen2.5-VL, Qwen2.5-Omni, InternVL, etc) #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions