Skip to content

convert: text-only support for GLM-4.1V-9B-Thinking #14823

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 23, 2025

Conversation

jacekpoplawski
Copy link
Contributor

@jacekpoplawski jacekpoplawski commented Jul 22, 2025

Fix #14495

This is my first attempt to contribute to llama.cpp.

I used Transformers to compare layers with GLM-4-9B-0414, the text structure appears identical.
the config for GLM-4.1V-9B-Thinking is missing the head_dim field


llama-cli -ngl 99 -n 1024 -m /mnt/models3/git/GLM-4.1V-9B-Thinking/GLM-9B-4.1V-Thinking-F16.gguf -sys "speak in english" 2>/dev/null

speak in english
> write one sentence about llama.cpp
<think>Got it, let's think about what to write about llama.cpp. It's a popular open-source project for running large language models efficiently, maybe on GPUs or specialized hardware. So a sentence
(...)

@github-actions github-actions bot added the python python script changes label Jul 22, 2025
@jacekpoplawski jacekpoplawski changed the title convert: text-only support for GLM-4.1V-9B-Thinking (#14495) convert: text-only support for GLM-4.1V-9B-Thinking Jul 22, 2025
* use language_model part only, ignore visual layers

* fix rope_dim calculation
@jacekpoplawski jacekpoplawski force-pushed the glm4_thinking_support branch from ad66a8f to d959644 Compare July 23, 2025 20:01
@CISC CISC merged commit a12363b into ggml-org:master Jul 23, 2025
5 checks passed
@ddpasa
Copy link
Contributor

ddpasa commented Jul 24, 2025

Please do the mmproj next! This VLM is supposed to be really good.

taronaeo pushed a commit to taronaeo/llama.cpp-s390x that referenced this pull request Jul 25, 2025
* use language_model part only, ignore visual layers

* fix rope_dim calculation
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Jul 25, 2025
* origin/master:
docs : update HOWTO‑add‑model.md for ModelBase and new model classes (ggml-org#14874)
ggml : remove invalid portPos specifiers from dot files (ggml-org#14838)
context : restore preemptive sched reset when LLAMA_SET_ROWS=0 (ggml-org#14870)
mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (ggml-org#14503)
rpc : check for null buffers in get/set/copy tensor endpoints (ggml-org#14868)
sched : fix multiple evaluations of the same graph with pipeline parallelism (ggml-org#14855)
musa: upgrade musa sdk to rc4.2.0 (ggml-org#14498)
sync : ggml
cmake : fix usage issues (ggml/1257)
ggml-cpu : remove stdlib include from repack.cpp (ggml/1276)
context : perform output reorder lazily upon access after sync (ggml-org#14853)
chat : fix kimi-k2 chat template (ggml-org#14852)
sycl: fixed semantics of block offset calculation (ggml-org#14814)
llama : fix MiniCPM inference after Granite Four changes (ggml-org#14850)
docs: add libcurl-dev install hint for Linux distros (ggml-org#14801)
metal : fix fusion across different encoders (ggml-org#14849)
sycl: fix undefined variable in work group size check (ggml-org#14843)
convert : text-only support for GLM-4.1V-9B-Thinking (ggml-org#14823)
CUDA: fix overflow in FA, tune performance (ggml-org#14840)
CUDA: fix compilation with GGML_CUDA_F16 (ggml-org#14837)
@danielhanchen
Copy link
Contributor

I made some quants! https://huggingface.co/unsloth/GLM-4.1V-9B-Thinking-GGUF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Support GLM-4.1V-9B-Thinking
4 participants