Add support for CogVLM model #15002

Tianyue-Zhao · 2025-08-01T01:50:42Z

This addresses the requests for CogVLM in #4387 and #4350.
CogVLM is a pretty popular model that now adds in cleanly after the recent additions to libmtmd.
I've converted a GGUF here: Link to GGUF files

Sample command and output:

build/bin/llama-mtmd-cli -m ../cogvlm-chat-hf/cogvlm-13B-chat-v1.1-F16.gguf --mmproj ../cogvlm-chat-hf/mmproj-cogvlm-chat-hf --image ./community.png --chat-template vicuna -p "Describe the picture"

load_hparams: model size:         8448.53 MiB
load_hparams: metadata size:      0.36 MiB
alloc_compute_meta:        CPU compute buffer size =   142.02 MiB
main: loading model: ../cogvlm-chat-hf/cogvlm-13B-chat-v1.1-F16.gguf
encoding image slice...
image slice encoded in 16135 ms
decoding image batch 1/1, n_tokens_batch = 1227
image decoded (batch 1/1) in 54065 ms

1. The image showcases a futuristic urban landscape with a mix of architectural styles. The buildings are multi-storied and have a combination of traditional and modern elements. There's a prominent tree in the foreground, suggesting a blend of nature and urban development. The scene appears to be bustling with activity, with various signs and billboards, indicating commercial or residential zones.


llama_perf_context_print:        load time =  108969.65 ms
llama_perf_context_print: prompt eval time =   85229.27 ms /  1241 tokens (   68.68 ms per token,    14.56 tokens per second)
llama_perf_context_print:        eval time =   19843.15 ms /    83 runs   (  239.07 ms per token,     4.18 tokens per second)
llama_perf_context_print:       total time =  126951.23 ms /  1324 tokens
llama_perf_context_print:    graphs reused =          0

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Tianyue-Zhao · 2025-08-02T16:47:30Z

I think I've fixed the typecheck and format check workflows that were failing before, can someone approve the workflows to run again?
Also, is there a way to run these Github workflows locally or without needing approval from a reviewer?
It would be good to run these CI/CD checks myself before posting the PR.

CISC · 2025-08-02T18:36:51Z

Also, is there a way to run these Github workflows locally or without needing approval from a reviewer? It would be good to run these CI/CD checks myself before posting the PR.

You can run flake8, pyright and editorconfig locally (or via IDE plugins), the build tests can be run manually with ctest.

CISC

This is not a complete review as I don't know enough about mtmd, just commenting...

CISC · 2025-08-02T20:20:42Z

convert_hf_to_gguf.py

+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.hparams['num_attention_heads'] = self.hparams['num_heads']
+
+    def set_gguf_parameters(self):


Suggested change

def __init__(self, *args, **kwargs):

super().__init__(*args, **kwargs)

self.hparams['num_attention_heads'] = self.hparams['num_heads']

def set_gguf_parameters(self):

def set_gguf_parameters(self):

Add num_heads to the list here instead:

llama.cpp/convert_hf_to_gguf.py

Line 1258 in de22157

self.gguf_writer.add_vision_head_count(self.find_vparam(["num_attention_heads"]))

CISC · 2025-08-02T20:23:51Z

convert_hf_to_gguf.py

+        if "query_key_value" in name:
+            # Split tensor into three along first axis
+            q, k, v = data_torch.split(data_torch.shape[0] // 3, dim=0)
+            return [
+                (self.map_tensor_name(name.replace("query_key_value", "query")), q),
+                (self.map_tensor_name(name.replace("query_key_value", "key")), k),
+                (self.map_tensor_name(name.replace("query_key_value", "value")), v),
+            ]
+
+        return [(self.map_tensor_name(name), data_torch)]


Suggested change

if "query_key_value" in name:

# Split tensor into three along first axis

q, k, v = data_torch.split(data_torch.shape[0] // 3, dim=0)

return [

(self.map_tensor_name(name.replace("query_key_value", "query")), q),

(self.map_tensor_name(name.replace("query_key_value", "key")), k),

(self.map_tensor_name(name.replace("query_key_value", "value")), v),

]

return [(self.map_tensor_name(name), data_torch)]

return [(self.map_tensor_name(name), data_torch)]

Create Q/K/V views at build time instead (check other (non-mm) models for examples).

CISC · 2025-08-02T20:24:30Z

convert_hf_to_gguf.py

+    def set_gguf_parameters(self):
+        super().set_gguf_parameters()
+
+    def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]:


Suggested change

def set_gguf_parameters(self):

super().set_gguf_parameters()

def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]:

def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]:

CISC · 2025-08-02T20:25:55Z

convert_hf_to_gguf.py

+        if "query_key_value.weight" in name:
+            # Slice tensor into three along first axis
+            q, k, v = data_torch.split(data_torch.shape[0] // 3, dim=0)
+            return [
+                (self.map_tensor_name(name.replace("query_key_value", "query")), q),
+                (self.map_tensor_name(name.replace("query_key_value", "key")), k),
+                (self.map_tensor_name(name.replace("query_key_value", "value")), v),
+            ]
+
+        return [(self.map_tensor_name(name), data_torch)]


Suggested change

if "query_key_value.weight" in name:

# Slice tensor into three along first axis

q, k, v = data_torch.split(data_torch.shape[0] // 3, dim=0)

return [

(self.map_tensor_name(name.replace("query_key_value", "query")), q),

(self.map_tensor_name(name.replace("query_key_value", "key")), k),

(self.map_tensor_name(name.replace("query_key_value", "value")), v),

]

return [(self.map_tensor_name(name), data_torch)]

return [(self.map_tensor_name(name), data_torch)]

CISC · 2025-08-02T20:34:12Z

src/llama-model.cpp

+                Qcur = ggml_rope(ctx0, Qcur, inp_pos, n_embd_head, GGML_ROPE_TYPE_NEOX);
+                Kcur = ggml_rope(ctx0, Kcur, inp_pos, n_embd_head, GGML_ROPE_TYPE_NEOX);


Update llama_model_rope_type instead and use rope_type.

Tianyue-Zhao · 2025-08-06T00:12:09Z

Also, is there a way to run these Github workflows locally or without needing approval from a reviewer? It would be good to run these CI/CD checks myself before posting the PR.

You can run flake8, pyright and editorconfig locally (or via IDE plugins), the build tests can be run manually with ctest.

Thanks for the info! That's something I've been wondering about for a while.

github-actions bot added examples python python script changes labels Aug 1, 2025

Tianyue-Zhao marked this pull request as ready for review August 1, 2025 02:15

Tianyue-Zhao added 12 commits August 2, 2025 16:41

Added GGUF mappings for CogVLM model

7914e7f

Add tensor mapping for CogVLM visual encoder

b8245b3

Add CogVLM to conversion script, no vision part yet

b16b149

Added CogVLM vision model to conversion script

bc3fafa

Add graph for CogVLM CLIP model

0a36f40

Add graph for CogVLM

d7006eb

Fixes for CogVLM. Now compiles.

ac0b715

Model now runs

06ad92f

Fixes for cogvlm graph

a9ec378

Format changes

f53b313

Fix broken rebase

0acdb78

Rebase and fix conflicts and type check

de22157

Tianyue-Zhao force-pushed the cogvlm_support branch from 42113d1 to de22157 Compare August 2, 2025 16:45

CISC reviewed Aug 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for CogVLM model #15002

Add support for CogVLM model #15002

Tianyue-Zhao commented Aug 1, 2025

Uh oh!

Tianyue-Zhao commented Aug 2, 2025

Uh oh!

CISC commented Aug 2, 2025

Uh oh!

CISC left a comment

Uh oh!

CISC Aug 2, 2025

Uh oh!

CISC Aug 2, 2025

Uh oh!

CISC Aug 2, 2025

Uh oh!

CISC Aug 2, 2025

Uh oh!

CISC Aug 2, 2025

Uh oh!

Tianyue-Zhao commented Aug 6, 2025

Uh oh!

Uh oh!

		Qcur = ggml_rope(ctx0, Qcur, inp_pos, n_embd_head, GGML_ROPE_TYPE_NEOX);
		Kcur = ggml_rope(ctx0, Kcur, inp_pos, n_embd_head, GGML_ROPE_TYPE_NEOX);

Add support for CogVLM model #15002

Are you sure you want to change the base?

Add support for CogVLM model #15002

Conversation

Tianyue-Zhao commented Aug 1, 2025

Uh oh!

Tianyue-Zhao commented Aug 2, 2025

Uh oh!

CISC commented Aug 2, 2025

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

CISC Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

CISC Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

CISC Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

CISC Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

CISC Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

Tianyue-Zhao commented Aug 6, 2025

Uh oh!

Uh oh!