granite embedding small support (ModernBert arch) #15641

ryan-mangeno · 2025-08-28T17:03:43Z

adding support to run granite embedding small, and it primarily pulls the modern bert architecture - https://huggingface.co/ibm-granite/granite-embedding-small-english-r2, currently working on it still, havent figured out the pre-tokenizer type or if I need to impliment it, also for the ubatch size the assert fails in llama-graph.cpp, hacked it to accept ubatch size of 1 for testing, but it seems to keep failing there and not sure why,

if I comment out of the line in llama-graph.cpp

assert(!ubatch.equal_seqs());

then it works

…orted yet but working on getting conversion to work for encoder only

…ated gate split with views, GEGLU is now used which does exactly this

…when building attention keeps failing, setting ubatch size to 1 when running llama-embedding with --ubatch-size 1 makes it work, but needs to be looked into more

ryan-mangeno · 2025-08-28T17:12:46Z

@gabe-l-hart thanks in advance :)

ryan-mangeno · 2025-08-28T17:14:13Z

@gabe-l-hart thanks in advance :)

also realizing this a little late haha, but should I be changing all of the modern bert stuff to a granite embedding macro like LLM_ARCH_GRANITE_EMBD or keep it as is

CISC · 2025-08-28T17:14:43Z

You may want to check out an earlier attempt at ModernBert in #14014

gabe-l-hart · 2025-08-28T17:19:26Z

Thanks for getting this together @ryan-mangeno and thanks for pointing out the previous work @CISC. Ryan, let me know if/when you've looked over that PR and found anything to fix and I'll take a pass at review.

gabe-l-hart · 2025-08-28T17:21:42Z

also realizing this a little late haha, but should I be changing all of the modern bert stuff to a granite embedding macro like LLM_ARCH_GRANITE_EMBD or keep it as is

In general, we want to keep things as generic as possible, so since this uses the ModernBertModel architecture from transformers, it's best to keep the implementation here similarly robust unless there's a concrete reason to subset the transformers architecture to just work for granite (eg there's some non-trivial code path in the transformers version that would make sense as a separate architecture).

ryan-mangeno · 2025-08-28T19:15:45Z

Thanks for getting this together @ryan-mangeno and thanks for pointing out the previous work @CISC. Ryan, let me know if/when you've looked over that PR and found anything to fix and I'll take a pass at review.

will do

…ecking out the rest

ryan-mangeno added 14 commits August 21, 2025 12:38

constants and tensor mappings for modern bert support, model not supp…

6151592

…orted yet but working on getting conversion to work for encoder only

conversion now working, hf -> gguf

6643c5a

working on support, now working on building graph

ac67fc6

some cleanup

cc40378

cleanup

41b6864

continuing

cc3d7ab

correct tensor shape for qkv

4ceb828

fixed tensor mappings and working on buildin graph

18c0c23

tensor debugging now works -> (llama-eval-callback), instead of simul…

bffe3c9

…ated gate split with views, GEGLU is now used which does exactly this

cleanup

8f32843

cleanup

9805635

cleanup

40249dd

more cleanup

853f344

ubatch issues, the assert for checking equal seqs in llama-graph.cpp …

2a1c750

…when building attention keeps failing, setting ubatch size to 1 when running llama-embedding with --ubatch-size 1 makes it work, but needs to be looked into more

ryan-mangeno marked this pull request as draft August 28, 2025 17:05

github-actions bot added the python python script changes label Aug 28, 2025

added cls token per previous modern bert attempt, still working on ch…

c73eb68

…ecking out the rest

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

granite embedding small support (ModernBert arch) #15641

granite embedding small support (ModernBert arch) #15641

ryan-mangeno commented Aug 28, 2025 •

edited

Loading

Uh oh!

ryan-mangeno commented Aug 28, 2025 •

edited

Loading

Uh oh!

ryan-mangeno commented Aug 28, 2025

Uh oh!

CISC commented Aug 28, 2025

Uh oh!

gabe-l-hart commented Aug 28, 2025

Uh oh!

gabe-l-hart commented Aug 28, 2025

Uh oh!

ryan-mangeno commented Aug 28, 2025

Uh oh!

Uh oh!

granite embedding small support (ModernBert arch) #15641

Are you sure you want to change the base?

granite embedding small support (ModernBert arch) #15641

Conversation

ryan-mangeno commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ryan-mangeno commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ryan-mangeno commented Aug 28, 2025

Uh oh!

CISC commented Aug 28, 2025

Uh oh!

gabe-l-hart commented Aug 28, 2025

Uh oh!

gabe-l-hart commented Aug 28, 2025

Uh oh!

ryan-mangeno commented Aug 28, 2025

Uh oh!

Uh oh!

ryan-mangeno commented Aug 28, 2025 •

edited

Loading

ryan-mangeno commented Aug 28, 2025 •

edited

Loading