llama : (mrope) allow using normal 1D position for text token #13138

ngxson · 2025-04-27T16:25:27Z

For M-RoPE, we want to use normal 1D position for text token.

This is done to simplify the use case of llama_decode() with text tokens, which is needed for adding Qwen2VL to libmtmd and to server.cpp

This should also align with #11875, because in the future we want text position to be tracked internally by libllama

ngxson · 2025-04-27T16:26:49Z

src/llama-graph.cpp


 ggml_tensor * llm_graph_context::build_inp_attn_scale() const {
-    auto inp = std::make_unique<llm_graph_input_attn_temp>(n_pos_per_token(), hparams.n_attn_temp_floor_scale, hparams.f_attn_temp_scale);
+    auto inp = std::make_unique<llm_graph_input_attn_temp>(n_pos_per_embd(), hparams.n_attn_temp_floor_scale, hparams.f_attn_temp_scale);


@ggerganov Because build_inp_attn_scale is currently used exclusively by llama 4, do you think we should get rid of n_pos_per_embd and replace it with a GGML_ASSERT(n_pos_per_embd() == 1) ?

The main motivation is to make this code looks less complicated, as there is ~0% chance Qwen model gonna use this

Yes, we can do that.

On second thought, build_inp_attn_scale should work well even in the case of N pos per token.

That's because the scale is applied per embedding, and the number of embedding is independent from N pos per token.

In any cases, I removed the n_pos_per_embd in 9cd16a3 , merging this PR once the CI is green

ggerganov · 2025-04-28T05:27:25Z

src/llama-graph.cpp


 ggml_tensor * llm_graph_context::build_inp_attn_scale() const {
-    auto inp = std::make_unique<llm_graph_input_attn_temp>(n_pos_per_token(), hparams.n_attn_temp_floor_scale, hparams.f_attn_temp_scale);
+    auto inp = std::make_unique<llm_graph_input_attn_temp>(n_pos_per_embd(), hparams.n_attn_temp_floor_scale, hparams.f_attn_temp_scale);


Yes, we can do that.

llama : (mrope) use normal position for text token

bd310ff

ngxson requested a review from ggerganov April 27, 2025 16:25

github-actions bot added the examples label Apr 27, 2025

ngxson commented Apr 27, 2025

View reviewed changes

ngxson mentioned this pull request Apr 27, 2025

mtmd : add qwen2vl and qwen2.5vl #13141

Merged

ggerganov approved these changes Apr 28, 2025

View reviewed changes

rm n_pos_per_embd from llm_graph_input_attn_temp

9cd16a3

ngxson merged commit d2b2031 into ggml-org:master Apr 28, 2025
48 checks passed

ngxson mentioned this pull request Apr 28, 2025

llama-graph : fix text position for mrope #13159

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : (mrope) allow using normal 1D position for text token #13138

llama : (mrope) allow using normal 1D position for text token #13138

Uh oh!

ngxson commented Apr 27, 2025 •

edited

Loading

Uh oh!

ngxson Apr 27, 2025 •

edited

Loading

Uh oh!

ggerganov Apr 28, 2025

Uh oh!

ngxson Apr 28, 2025

Uh oh!

ggerganov Apr 28, 2025

Uh oh!

Uh oh!

Uh oh!

llama : (mrope) allow using normal 1D position for text token #13138

llama : (mrope) allow using normal 1D position for text token #13138

Uh oh!

Conversation

ngxson commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganov Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ngxson commented Apr 27, 2025 •

edited

Loading

ngxson Apr 27, 2025 •

edited

Loading