-
Notifications
You must be signed in to change notification settings - Fork 12.4k
Add LLaDA 8b Diffusion model #14771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add LLaDA 8b Diffusion model #14771
Conversation
llama: fix llama-model fixup working
77fc759
to
e4b7346
Compare
convert_hf_to_gguf.py
Outdated
def set_vocab(self): | ||
try: | ||
self._set_vocab_sentencepiece() | ||
except FileNotFoundError: | ||
try: | ||
self._set_vocab_llama_hf() | ||
except (FileNotFoundError, TypeError): | ||
# Llama 3 | ||
self._set_vocab_gpt2() | ||
|
||
# Apply to CodeLlama only (and ignore for Llama 3 with a vocab size of 128256) | ||
if self.hparams.get("vocab_size", 32000) == 32016: | ||
special_vocab = gguf.SpecialVocab( | ||
self.dir_model, load_merges=False, | ||
special_token_types = ['prefix', 'suffix', 'middle', 'eot'] | ||
) | ||
special_vocab._set_special_token("prefix", 32007) | ||
special_vocab._set_special_token("suffix", 32008) | ||
special_vocab._set_special_token("middle", 32009) | ||
special_vocab._set_special_token("eot", 32010) | ||
special_vocab.add_to_gguf(self.gguf_writer) | ||
|
||
tokenizer_config_file = self.dir_model / 'tokenizer_config.json' | ||
if tokenizer_config_file.is_file(): | ||
with open(tokenizer_config_file, "r", encoding="utf-8") as f: | ||
tokenizer_config_json = json.load(f) | ||
if "add_prefix_space" in tokenizer_config_json: | ||
self.gguf_writer.add_add_space_prefix(tokenizer_config_json["add_prefix_space"]) | ||
|
||
# Apply to granite small models only | ||
if self.hparams.get("vocab_size", 32000) == 49152: | ||
self.gguf_writer.add_add_bos_token(False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is clearly an error...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry, this is copied from the llama architecture, removed it. I tried to subclass this arch with LLama but it didn't work for me because of the many changes in set_gguf_parameters
. But perhaps it can work by just sub-classing as that is the probably the correct way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, what were you trying to achieve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to re-use the llama architecture for this, since it's just the same architecture with different params. I'm not sure what is the correct way to do this which led to a bit of trial and errors with my gguf, in the end I just copied class LLamaModel(TextModel)
but forgot to remove the other stuff. Can you tell me what is the right way to do this?
Continuing on #14644, this PR adds another diffusion model https://huggingface.co/GSAI-ML/LLaDA-8B-Instruct, which has different semantics compared to the dream-7b model, and overall seems to have better performance
There are very few similarities between how they seem to generate tokens, so for now I've just created two different examples
llama-diffusion-dream-cli
(for the earlier version) andllama-diffusion-llada-cli
, for running the new LLaDA model. Added a README as wellI've uploaded a GGUF.
Example command
./build/bin/llama-diffusion-llada-cli -m llada-8b.gguf -p "Lily can run 12 kilometers per hour for 4 hours. After that, she runs 6 kilometers per hour. How many kilometers can she run in 8 hours?" --diffusion_steps 128 -ngl 99 --temp 0 -ub 128 --diffusion-visual
Also I would like this to the server, but I'm not sure what API would be acceptable so I'm hoping to have a discussion on that as well