Add LLaDA 8b Diffusion model #14771

am17an · 2025-07-19T09:50:27Z

Continuing on #14644, this PR adds another diffusion model https://huggingface.co/GSAI-ML/LLaDA-8B-Instruct, which has different semantics compared to the dream-7b model, and overall seems to have better performance

There are very few similarities between how they seem to generate tokens, so for now I've just created two different examples llama-diffusion-dream-cli (for the earlier version) and llama-diffusion-llada-cli, for running the new LLaDA model. Added a README as well

I've uploaded a GGUF.

Example command
./build/bin/llama-diffusion-llada-cli -m llada-8b.gguf -p "Lily can run 12 kilometers per hour for 4 hours. After that, she runs 6 kilometers per hour. How many kilometers can she run in 8 hours?" --diffusion_steps 128 -ngl 99 --temp 0 -ub 128 --diffusion-visual

Also I would like this to the server, but I'm not sure what API would be acceptable so I'm hoping to have a discussion on that as well

llama: fix llama-model fixup working

convert_hf_to_gguf.py

CISC · 2025-07-19T20:06:04Z

convert_hf_to_gguf.py

+    def set_vocab(self):
+        try:
+            self._set_vocab_sentencepiece()
+        except FileNotFoundError:
+            try:
+                self._set_vocab_llama_hf()
+            except (FileNotFoundError, TypeError):
+                # Llama 3
+                self._set_vocab_gpt2()
+
+        # Apply to CodeLlama only (and ignore for Llama 3 with a vocab size of 128256)
+        if self.hparams.get("vocab_size", 32000) == 32016:
+            special_vocab = gguf.SpecialVocab(
+                self.dir_model, load_merges=False,
+                special_token_types = ['prefix', 'suffix', 'middle', 'eot']
+            )
+            special_vocab._set_special_token("prefix", 32007)
+            special_vocab._set_special_token("suffix", 32008)
+            special_vocab._set_special_token("middle", 32009)
+            special_vocab._set_special_token("eot",    32010)
+            special_vocab.add_to_gguf(self.gguf_writer)
+
+        tokenizer_config_file = self.dir_model / 'tokenizer_config.json'
+        if tokenizer_config_file.is_file():
+            with open(tokenizer_config_file, "r", encoding="utf-8") as f:
+                tokenizer_config_json = json.load(f)
+                if "add_prefix_space" in tokenizer_config_json:
+                    self.gguf_writer.add_add_space_prefix(tokenizer_config_json["add_prefix_space"])
+
+        # Apply to granite small models only
+        if self.hparams.get("vocab_size", 32000) == 49152:
+            self.gguf_writer.add_add_bos_token(False)


Ok, this is clearly an error...

Yeah sorry, this is copied from the llama architecture, removed it. I tried to subclass this arch with LLama but it didn't work for me because of the many changes in set_gguf_parameters. But perhaps it can work by just sub-classing as that is the probably the correct way to do it.

I'm confused, what were you trying to achieve?

I was trying to re-use the llama architecture for this, since it's just the same architecture with different params. I'm not sure what is the correct way to do this which led to a bit of trial and errors with my gguf, in the end I just copied class LLamaModel(TextModel) but forgot to remove the other stuff. Can you tell me what is the right way to do this?

common/arg.cpp

examples/diffusion/README.md

github-actions bot added examples python python script changes labels Jul 19, 2025

am17an force-pushed the add_llada_8b branch from e362c14 to 87b3235 Compare July 19, 2025 10:05

am17an requested a review from ggerganov July 19, 2025 10:06

Add support for Llada-8b: diffusion model

d81ce04

llama: fix llama-model fixup working

am17an force-pushed the add_llada_8b branch from 87b3235 to d27740c Compare July 19, 2025 10:17

am17an requested a review from CISC July 19, 2025 11:05

am17an force-pushed the add_llada_8b branch 2 times, most recently from 77fc759 to e4b7346 Compare July 19, 2025 14:41

Add README

5644f2f

am17an force-pushed the add_llada_8b branch from e4b7346 to 5644f2f Compare July 19, 2025 14:59

CISC reviewed Jul 19, 2025

View reviewed changes

Fix README and convert_hf_to_gguf

6317827

am17an force-pushed the add_llada_8b branch from 7a5747d to 6317827 Compare July 20, 2025 02:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add LLaDA 8b Diffusion model #14771

Add LLaDA 8b Diffusion model #14771

am17an commented Jul 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

CISC Jul 19, 2025

Uh oh!

am17an Jul 20, 2025

Uh oh!

CISC Jul 20, 2025

Uh oh!

am17an Jul 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add LLaDA 8b Diffusion model #14771

Are you sure you want to change the base?

Add LLaDA 8b Diffusion model #14771

Conversation

am17an commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

CISC Jul 19, 2025

Choose a reason for hiding this comment

Uh oh!

am17an Jul 20, 2025

Choose a reason for hiding this comment

Uh oh!

CISC Jul 20, 2025

Choose a reason for hiding this comment

Uh oh!

am17an Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

am17an commented Jul 19, 2025 •

edited

Loading

am17an Jul 20, 2025 •

edited

Loading