diff --git a/gallery/index.yaml b/gallery/index.yaml index 059f3503871d..4e994e940907 100644 --- a/gallery/index.yaml +++ b/gallery/index.yaml @@ -22599,3 +22599,51 @@ - filename: Almost-Human-X3-32bit-1839-6B.i1-Q4_K_M.gguf sha256: 5dc9766b505d98d7a5ad960b321c1fafe508734ca12ff4b7c480f8afbbc1e03b uri: huggingface://mradermacher/Almost-Human-X3-32bit-1839-6B-i1-GGUF/Almost-Human-X3-32bit-1839-6B.i1-Q4_K_M.gguf +- !!merge <<: *jamba + name: "ai21-jamba-reasoning-3b" + urls: + - https://huggingface.co/Mungert/AI21-Jamba-Reasoning-3B-GGUF + description: | + **Model Name:** AI21 Jamba Reasoning 3B + **Repository:** [ai21labs/AI21-Jamba-Reasoning-3B-GGUF](https://huggingface.co/ai21labs/AI21-Jamba-Reasoning-3B-GGUF) + **License:** Apache 2.0 + **Pipeline:** Text Generation + **Architecture:** Hybrid Transformer–Mamba (28 layers: 26 Mamba, 2 Attention) + **Parameters:** 3B + **Context Length:** Up to 256,000 tokens + **Quantizations Available:** FP16 (6.4 GB), Q4_K_M (1.93 GB) + **Developed by:** AI21 + + --- + + ### 🔍 **Overview** + AI21’s Jamba Reasoning 3B is a compact, high-performance reasoning model that combines the strengths of Transformers and Mamba architecture for efficient long-context processing. Despite its small size, it delivers top-tier intelligence scores—surpassing models like Llama 3.2 3B and Gemma 3 4B—on benchmarks such as MMLU-Pro and IFBench. + + ### ⚡ **Key Features** + - **Ultra-long context (256K tokens):** Mamba layers enable scalable, memory-efficient handling of extremely long inputs. + - **High efficiency:** Optimized for edge devices, laptops, and low-resource environments without sacrificing performance. + - **Strong reasoning capabilities:** Fine-tuned for structured reasoning, tool use, code generation, and complex problem solving. + - **Multiple runtime support:** Works with `llama.cpp`, `vLLM`, `Ollama`, and `LM Studio`. + + ### 📊 **Performance Highlights** + - **MMLU-Pro:** 61.0% + - **Humanity’s Last Exam:** 6.0% + - **IFBench:** 52.0% + + > ✅ Ideal for use cases requiring fast, accurate reasoning over long documents—e.g., document analysis, code generation, customer support ticket triage, and research summarization. + + ### 📥 **Quick Start** + Use with `llama.cpp` or `vLLM` for local inference. Pre-quantized GGUF files are available for lightweight deployment. + + > 📌 **Note:** This is the *original* model from AI21. The repository `Mungert/AI21-Jamba-Reasoning-3B-GGUF` is a user-quantized version; always refer to `ai21labs/AI21-Jamba-Reasoning-3B-GGUF` for the official, unquantized model description. + + --- + + 🔗 **Learn More:** [AI21 Blog – Introducing Jamba Reasoning 3B](https://www.ai21.com/blog/introducing-jamba-reasoning-3B) + overrides: + parameters: + model: AI21-Jamba-Reasoning-3B-q4_k_m.gguf + files: + - filename: AI21-Jamba-Reasoning-3B-q4_k_m.gguf + sha256: a5ad6704bb679010ffeea5a2a8c06e652ce7dec0c955f8945586ba2f5a643cc7 + uri: huggingface://Mungert/AI21-Jamba-Reasoning-3B-GGUF/AI21-Jamba-Reasoning-3B-q4_k_m.gguf