Skip to content

Add LLaDA 8b Diffusion model #14771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

am17an
Copy link
Collaborator

@am17an am17an commented Jul 19, 2025

Continuing on #14644, this PR adds another diffusion model https://huggingface.co/GSAI-ML/LLaDA-8B-Instruct, which has different semantics compared to the dream-7b model, and overall seems to have better performance

There are very few similarities between how they seem to generate tokens, so for now I've just created two different examples llama-diffusion-dream-cli (for the earlier version) and llama-diffusion-llada-cli, for running the new LLaDA model. Added a README as well

I've uploaded a GGUF.

Example command
./build/bin/llama-diffusion-llada-cli -m llada-8b.gguf -p "Lily can run 12 kilometers per hour for 4 hours. After that, she runs 6 kilometers per hour. How many kilometers can she run in 8 hours?" --diffusion_steps 128 -ngl 99 --temp 0 -ub 128 --diffusion-visual

Also I would like this to the server, but I'm not sure what API would be acceptable so I'm hoping to have a discussion on that as well

@github-actions github-actions bot added examples python python script changes labels Jul 19, 2025
@am17an am17an requested a review from ggerganov July 19, 2025 10:06
llama: fix llama-model

fixup

working
@am17an am17an requested a review from CISC July 19, 2025 11:05
@am17an am17an force-pushed the add_llada_8b branch 2 times, most recently from 77fc759 to e4b7346 Compare July 19, 2025 14:41
Comment on lines 2902 to 2933
def set_vocab(self):
try:
self._set_vocab_sentencepiece()
except FileNotFoundError:
try:
self._set_vocab_llama_hf()
except (FileNotFoundError, TypeError):
# Llama 3
self._set_vocab_gpt2()

# Apply to CodeLlama only (and ignore for Llama 3 with a vocab size of 128256)
if self.hparams.get("vocab_size", 32000) == 32016:
special_vocab = gguf.SpecialVocab(
self.dir_model, load_merges=False,
special_token_types = ['prefix', 'suffix', 'middle', 'eot']
)
special_vocab._set_special_token("prefix", 32007)
special_vocab._set_special_token("suffix", 32008)
special_vocab._set_special_token("middle", 32009)
special_vocab._set_special_token("eot", 32010)
special_vocab.add_to_gguf(self.gguf_writer)

tokenizer_config_file = self.dir_model / 'tokenizer_config.json'
if tokenizer_config_file.is_file():
with open(tokenizer_config_file, "r", encoding="utf-8") as f:
tokenizer_config_json = json.load(f)
if "add_prefix_space" in tokenizer_config_json:
self.gguf_writer.add_add_space_prefix(tokenizer_config_json["add_prefix_space"])

# Apply to granite small models only
if self.hparams.get("vocab_size", 32000) == 49152:
self.gguf_writer.add_add_bos_token(False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is clearly an error...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry, this is copied from the llama architecture, removed it. I tried to subclass this arch with LLama but it didn't work for me because of the many changes in set_gguf_parameters. But perhaps it can work by just sub-classing as that is the probably the correct way to do it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, what were you trying to achieve?

Copy link
Collaborator Author

@am17an am17an Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to re-use the llama architecture for this, since it's just the same architecture with different params. I'm not sure what is the correct way to do this which led to a bit of trial and errors with my gguf, in the end I just copied class LLamaModel(TextModel) but forgot to remove the other stuff. Can you tell me what is the right way to do this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants