Add README

am17an · am17an · commit e362c1475a33 · 2025-07-19T17:40:15.000+08:00
diff --git a/common/arg.cpp b/common/arg.cpp
@@ -3480,7 +3480,7 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
         [](common_params & params, const std::string & value) { params.diffusion_llada.cfg_scale = std::stof(value); }
     ).set_examples({ LLAMA_EXAMPLE_DIFFUSION_LLADA }));
     add_opt(common_arg(
-        { "--diffusion-remasking-alg" }, "N",
+        { "--diffusion-alg" }, "N",
         string_format("remasking algorithm: 0=LOW_CONFIDENCE, 1=RANDOM (default: %d)", params.diffusion_llada.remasking),
         [](common_params & params, int value) { params.diffusion_llada.remasking = value; }
     ).set_examples({ LLAMA_EXAMPLE_DIFFUSION_LLADA }));
diff --git a/convert_hf_to_gguf.py b/convert_hf_to_gguf.py
@@ -2943,15 +2943,15 @@ def set_gguf_parameters(self):
         self.gguf_writer.add_rope_dimension_count(rope_dim)
 
         # Set context length for LLaDA
-        context_length = self.hparams.get("max_sequence_length")
+        context_length = self.hparams.get("max_sequence_length", 4096)
         self.gguf_writer.add_context_length(context_length)
 
         # Set embedding length (dimension size)
-        embedding_length = self.hparams.get("d_model")
+        embedding_length = self.hparams.get("d_model", 4096)
         self.gguf_writer.add_embedding_length(embedding_length)
 
         # Set feed forward length (MLP hidden size)
-        feed_forward_length = self.hparams.get("mlp_hidden_size")
+        feed_forward_length = self.hparams.get("mlp_hidden_size", 12288)
         self.gguf_writer.add_feed_forward_length(feed_forward_length)
 
         # Set RoPE parameters
diff --git a/examples/diffusion/README.md b/examples/diffusion/README.md
@@ -0,0 +1,39 @@
+# Diffusion Text Generation Examples
+
+This directory contains implementations for diffusion-based text generation using two different model architectures: **Dream** and **LLaDA-8B**. Both models use iterative denoising processes to generate text, but employ different sampling strategies and algorithms.
+
+## Supported Models
+
+### 1. Dream Model (`llama-diffusion-dream-cli`)
+
+- https://huggingface.co/Dream-org/Dream-v0-Base-7B
+- Original PR - https://github.com/ggml-org/llama.cpp/pull/14644
+
+The Dream model supports four different sampling algorithms controlled by the `--diffusion-alg` parameter:
+
+1. **ORIGIN (0)** - Original diffusion algorithm
+   - Uses probability transfer based on timestep ratios
+   - Default algorithm with standard confidence-based token selection
+
+2. **MASKGIT_PLUS (1)** - Enhanced MaskGIT sampling
+   - Improved version of the MaskGIT algorithm
+
+3. **TOPK_MARGIN (2)** - Top-K margin-based sampling
+   - Confidence calculated as the margin between top-1 and top-2 probabilities
+
+4. **ENTROPY (3)** - Entropy-based sampling (recommended)
+   - Uses entropy calculation for confidence estimation
+
+### 2. LLaDA-8B Model (`llama-diffusion-llada-cli`)
+
+- https://huggingface.co/GSAI-ML/LLaDA-8B-Instruct
+
+### LLaDA Model Remasking Strategies
+
+The LLaDA model uses two remasking approaches controlled by the `--diffusion-alg` parameter:
+
+1. **REMASKING_LOW_CONFIDENCE (0)** - Default strategy
+   - Remasks tokens with lowest confidence scores
+   - Uses softmax probabilities to determine confidence
+
+2. **REMASKING_RANDOM (1)** - Random remasking