Alibaba-NLP · chindris-mihai-alexandru · Nov 28, 2025 · Nov 28, 2025 · Nov 28, 2025 · Nov 28, 2025
diff --git a/.env.example b/.env.example
@@ -46,10 +46,24 @@ MAX_WORKERS=30
 # API Keys and External Services
 # =============================================================================
 
-# Serper API for web search and Google Scholar
+# Web Search Providers (in order of quality/preference)
+# The system will try each provider in order until one succeeds.
+# You only need ONE provider configured, but having multiple provides fallback.
+
+# Exa.ai - Best semantic/neural search ($10 free credits)
+# Get your key from: https://exa.ai/
+EXA_API_KEY=your_key
+
+# Tavily - Purpose-built for RAG/LLMs (1,000 free requests/month)
+# Get your key from: https://tavily.com/
+TAVILY_API_KEY=your_key
+
+# Serper API for Google search results (2,500 free queries)
 # Get your key from: https://serper.dev/
 SERPER_KEY_ID=your_key
 
+# DuckDuckGo is always available as final fallback (FREE, no API key needed)
+
 # Jina API for web page reading
 # Get your key from: https://jina.ai/
 JINA_API_KEYS=your_key
@@ -95,4 +109,17 @@ IDP_KEY_SECRET=your_idp_key_secret
 
 # These are typically set by distributed training frameworks
 # WORLD_SIZE=1
-# RANK=0
+# RANK=0
+
+# =============================================================================
+# llama.cpp Local Inference (Alternative for Mac/Local Users)
+# =============================================================================
+# If using the llama.cpp local inference option instead of vLLM:
+
+# The llama.cpp server URL (default works if using start_llama_server.sh)
+LLAMA_SERVER_URL=http://127.0.0.1:8080
+
+# For llama.cpp mode:
+# - Web search uses DuckDuckGo by default (FREE, no API key needed)
+# - JINA_API_KEYS is optional but recommended for better page reading
+# - See: python inference/interactive_llamacpp.py --help
diff --git a/README.md b/README.md
@@ -179,6 +179,55 @@ You need to modify the following in the file [inference/react_agent.py](https://
 - Change the model name to alibaba/tongyi-deepresearch-30b-a3b.
 - Adjust the content concatenation way as described in the comments on lines **88–90.**
 
+
+---
+
+### 7. Local Inference with llama.cpp (Optional)
+
+> **For Mac users or anyone who wants 100% local inference without vLLM/CUDA dependencies.**
+
+This repo includes support for running DeepResearch locally using [llama.cpp](https://github.com/ggerganov/llama.cpp) with Metal (Apple Silicon) or CUDA acceleration. Zero API costs, full privacy.
+
+#### Requirements
+
+- llama.cpp built with Metal or CUDA support
+- GGUF model: [bartowski/Alibaba-NLP_Tongyi-DeepResearch-30B-A3B-GGUF](https://huggingface.co/bartowski/Alibaba-NLP_Tongyi-DeepResearch-30B-A3B-GGUF)
+- 32GB+ RAM (for Q4_K_M quantization)
+
+#### Quick Start
+
+```bash
+# Install minimal dependencies
+pip install -r requirements-local.txt
+
+# Build llama.cpp (Mac with Metal)
+cd llama.cpp
+cmake -B build -DLLAMA_METAL=ON -DCMAKE_BUILD_TYPE=Release
+cmake --build build --config Release
+cd ..
+
+# Download model (~18GB)
+mkdir -p models/gguf
+curl -L -o models/gguf/Alibaba-NLP_Tongyi-DeepResearch-30B-A3B-Q4_K_M.gguf \
+  'https://huggingface.co/bartowski/Alibaba-NLP_Tongyi-DeepResearch-30B-A3B-GGUF/resolve/main/Alibaba-NLP_Tongyi-DeepResearch-30B-A3B-Q4_K_M.gguf'
+
+# Terminal 1: Start the server
+./scripts/start_llama_server.sh
+
+# Terminal 2: Run research queries
+python inference/interactive_llamacpp.py
+```
+
+The llama.cpp server provides both an API and a web UI at http://localhost:8080.
+
+#### Features
+
+- **Free web search**: Uses DuckDuckGo (no API key required)
+- **Page visiting**: Uses Jina Reader (optional API key for better results)
+- **Loop detection**: Prevents infinite tool call cycles
+- **32K context**: Long research sessions supported
+
+---
 ## Benchmark Evaluation
 
 We provide benchmark evaluation scripts for various datasets. Please refer to the [evaluation scripts](./evaluation/) directory for more details.