diff --git a/README.md b/README.md index c59b74fb..dff134d5 100644 --- a/README.md +++ b/README.md @@ -411,12 +411,12 @@ npx @zilliz/claude-context-mcp@latest -For more detailed MCP environment variable configuration, see our [Environment Variables Guide](docs/getting-started/environment-variables.md). - -📚 **Need more help?** Check out our [complete documentation](docs/) for detailed guides and troubleshooting tips. - --- +**How to configure environment variables for MCP:** For more detailed MCP environment variable configuration, see our [Environment Variables Guide](docs/getting-started/environment-variables.md). +**Using Different Embedding Models with MCP:** To configure specific embedding models (e.g., `text-embedding-3-large` for OpenAI, `voyage-code-3` for VoyageAI), see the [MCP Configuration Examples](packages/mcp/README.md#embedding-provider-configuration) for detailed setup instructions for each provider. + +📚 **Need more help?** Check out our [complete documentation](docs/) for detailed guides and troubleshooting tips. --- diff --git a/docs/getting-started/environment-variables.md b/docs/getting-started/environment-variables.md index ed55fd8f..e4d9c348 100644 --- a/docs/getting-started/environment-variables.md +++ b/docs/getting-started/environment-variables.md @@ -21,21 +21,37 @@ Claude Context supports a global configuration file at `~/.context/.env` to simp | Variable | Description | Default | |----------|-------------|---------| | `EMBEDDING_PROVIDER` | Provider: `OpenAI`, `VoyageAI`, `Gemini`, `Ollama` | `OpenAI` | +| `EMBEDDING_MODEL` | Embedding model name (works for all providers) | Provider-specific default | | `OPENAI_API_KEY` | OpenAI API key | Required for OpenAI | | `VOYAGEAI_API_KEY` | VoyageAI API key | Required for VoyageAI | | `GEMINI_API_KEY` | Gemini API key | Required for Gemini | +> **💡 Note:** `EMBEDDING_MODEL` is a universal environment variable that works with all embedding providers. Simply set it to the model name you want to use (e.g., `text-embedding-3-large` for OpenAI, `voyage-code-3` for VoyageAI, etc.). + +> **Supported Model Names:** +> +> - OpenAI Models: See `getSupportedModels` in [`openai-embedding.ts`](https://github.com/zilliztech/claude-context/blob/master/packages/core/src/embedding/openai-embedding.ts) for the full list of supported models. +> +> - VoyageAI Models: See `getSupportedModels` in [`voyageai-embedding.ts`](https://github.com/zilliztech/claude-context/blob/master/packages/core/src/embedding/voyageai-embedding.ts) for the full list of supported models. +> +> - Gemini Models: See `getSupportedModels` in [`gemini-embedding.ts`](https://github.com/zilliztech/claude-context/blob/master/packages/core/src/embedding/gemini-embedding.ts) for the full list of supported models. +> +> - Ollama Models: Depends on the model you install locally. + +> **📖 For detailed provider-specific configuration examples and setup instructions, see the [MCP Configuration Guide](../../packages/mcp/README.md#embedding-provider-configuration).** + ### Vector Database | Variable | Description | Default | |----------|-------------|---------| | `MILVUS_TOKEN` | Milvus authentication token. Get [Zilliz Personal API Key](https://github.com/zilliztech/claude-context/blob/master/assets/signup_and_get_apikey.png) | Recommended | | `MILVUS_ADDRESS` | Milvus server address. Optional when using Zilliz Personal API Key | Auto-resolved from token | -### Ollama (Local) +### Ollama (Optional) | Variable | Description | Default | |----------|-------------|---------| | `OLLAMA_HOST` | Ollama server URL | `http://127.0.0.1:11434` | -| `OLLAMA_MODEL` | Model name | `nomic-embed-text` | +| `OLLAMA_MODEL`(alternative to `EMBEDDING_MODEL`) | Model name | | + ### Advanced Configuration | Variable | Description | Default | @@ -54,6 +70,7 @@ mkdir -p ~/.context cat > ~/.context/.env << 'EOF' EMBEDDING_PROVIDER=OpenAI OPENAI_API_KEY=sk-your-openai-api-key +EMBEDDING_MODEL=text-embedding-3-small MILVUS_TOKEN=your-zilliz-cloud-api-key EOF ``` diff --git a/packages/mcp/README.md b/packages/mcp/README.md index fe6c2c16..e6032fdb 100644 --- a/packages/mcp/README.md +++ b/packages/mcp/README.md @@ -31,7 +31,7 @@ Before using the MCP server, make sure you have: Claude Context MCP supports multiple embedding providers. Choose the one that best fits your needs: -> 💡 **Tip**: You can also use [global environment variables](../../docs/getting-started/environment-variables.md) for easier configuration management across different MCP clients. +> 📋 **Quick Reference**: For a complete list of environment variables and their descriptions, see the [Environment Variables Guide](../../docs/getting-started/environment-variables.md). ```bash # Supported providers: OpenAI, VoyageAI, Gemini, Ollama @@ -55,9 +55,7 @@ OPENAI_BASE_URL=https://api.openai.com/v1 ``` **Available Models:** -- `text-embedding-3-small` (1536 dimensions, faster, lower cost) -- `text-embedding-3-large` (3072 dimensions, higher quality) -- `text-embedding-ada-002` (1536 dimensions, legacy model) +See `getSupportedModels` in [`openai-embedding.ts`](https://github.com/zilliztech/claude-context/blob/master/packages/core/src/embedding/openai-embedding.ts) for the full list of supported models. **Getting API Key:** 1. Visit [OpenAI Platform](https://platform.openai.com/api-keys) @@ -81,9 +79,7 @@ EMBEDDING_MODEL=voyage-code-3 ``` **Available Models:** -- `voyage-code-3` (1024 dimensions, optimized for code) -- `voyage-3` (1024 dimensions, general purpose) -- `voyage-3-lite` (512 dimensions, faster inference) +See `getSupportedModels` in [`voyageai-embedding.ts`](https://github.com/zilliztech/claude-context/blob/master/packages/core/src/embedding/voyageai-embedding.ts) for the full list of supported models. **Getting API Key:** 1. Visit [VoyageAI Console](https://dash.voyageai.com/) @@ -107,7 +103,7 @@ EMBEDDING_MODEL=gemini-embedding-001 ``` **Available Models:** -- `gemini-embedding-001` (3072 dimensions, latest model) +See `getSupportedModels` in [`gemini-embedding.ts`](https://github.com/zilliztech/claude-context/blob/master/packages/core/src/embedding/gemini-embedding.ts) for the full list of supported models. **Getting API Key:** 1. Visit [Google AI Studio](https://aistudio.google.com/) @@ -130,11 +126,6 @@ EMBEDDING_MODEL=nomic-embed-text OLLAMA_HOST=http://127.0.0.1:11434 ``` -**Available Models:** -- `nomic-embed-text` (768 dimensions, recommended for code) -- `mxbai-embed-large` (1024 dimensions, higher quality) -- `all-minilm` (384 dimensions, lightweight) - **Setup Instructions:** 1. Install Ollama from [ollama.ai](https://ollama.ai/) 2. Pull the embedding model: @@ -557,18 +548,19 @@ npx @zilliz/claude-context-mcp@latest ## Features -- 🔌 MCP Protocol Compliance: Full compatibility with MCP-enabled AI assistants and agents -- 🔍 Semantic Code Search: Natural language queries to find relevant code snippets -- 📁 Codebase Indexing: Index entire codebases for fast semantic search -- 🔄 Auto-Sync: Automatically detects and synchronizes file changes to keep index up-to-date -- 🧠 AI-Powered: Uses OpenAI embeddings and Milvus vector database -- ⚡ Real-time: Interactive indexing and searching with progress feedback -- 🛠️ Tool-based: Exposes three main tools via MCP protocol +- 🔌 **MCP Protocol Compliance**: Full compatibility with MCP-enabled AI assistants and agents +- 🔍 **Hybrid Code Search**: Natural language queries using advanced hybrid search (BM25 + dense vector) to find relevant code snippets +- 📁 **Codebase Indexing**: Index entire codebases for fast hybrid search across millions of lines of code +- 🔄 **Incremental Indexing**: Efficiently re-index only changed files using Merkle trees for auto-sync +- 🧩 **Intelligent Code Chunking**: AST-based code analysis for syntax-aware chunking with automatic fallback +- 🗄️ **Scalable**: Integrates with Zilliz Cloud for scalable vector search, no matter how large your codebase is +- 🛠️ **Customizable**: Configure file extensions, ignore patterns, and embedding models +- ⚡ **Real-time**: Interactive indexing and searching with progress feedback ## Available Tools ### 1. `index_codebase` -Index a codebase directory for semantic search. +Index a codebase directory for hybrid search (BM25 + dense vector). **Parameters:** - `path` (required): Absolute path to the codebase directory to index @@ -578,12 +570,13 @@ Index a codebase directory for semantic search. - `ignorePatterns` (optional): Additional ignore patterns to exclude specific files/directories beyond defaults (e.g., ['static/**', '*.tmp', 'private/**']) (default: []) ### 2. `search_code` -Search the indexed codebase using natural language queries. +Search the indexed codebase using natural language queries with hybrid search (BM25 + dense vector). **Parameters:** - `path` (required): Absolute path to the codebase directory to search in - `query` (required): Natural language query to search for in the codebase - `limit` (optional): Maximum number of results to return (default: 10, max: 50) +- `extensionFilter` (optional): List of file extensions to filter results (e.g., ['.ts', '.py']) (default: []) ### 3. `clear_index` Clear the search index for a specific codebase. @@ -591,6 +584,12 @@ Clear the search index for a specific codebase. **Parameters:** - `path` (required): Absolute path to the codebase directory to clear index for +### 4. `get_indexing_status` +Get the current indexing status of a codebase. Shows progress percentage for actively indexing codebases and completion status for indexed codebases. + +**Parameters:** +- `path` (required): Absolute path to the codebase directory to check status for + ## Contributing diff --git a/packages/mcp/src/config.ts b/packages/mcp/src/config.ts index 07405434..25fc6a8d 100644 --- a/packages/mcp/src/config.ts +++ b/packages/mcp/src/config.ts @@ -45,7 +45,7 @@ export function getDefaultModelForProvider(provider: string): string { export function getEmbeddingModelForProvider(provider: string): string { switch (provider) { case 'Ollama': - // For Ollama, prioritize OLLAMA_MODEL over EMBEDDING_MODEL + // For Ollama, prioritize OLLAMA_MODEL over EMBEDDING_MODEL for backward compatibility const ollamaModel = envManager.get('OLLAMA_MODEL') || envManager.get('EMBEDDING_MODEL') || getDefaultModelForProvider(provider); console.log(`[DEBUG] 🎯 Ollama model selection: OLLAMA_MODEL=${envManager.get('OLLAMA_MODEL') || 'NOT SET'}, EMBEDDING_MODEL=${envManager.get('EMBEDDING_MODEL') || 'NOT SET'}, selected=${ollamaModel}`); return ollamaModel; @@ -53,8 +53,10 @@ export function getEmbeddingModelForProvider(provider: string): string { case 'VoyageAI': case 'Gemini': default: - // For other providers, use EMBEDDING_MODEL or default - return envManager.get('EMBEDDING_MODEL') || getDefaultModelForProvider(provider); + // For all other providers, use EMBEDDING_MODEL or default + const selectedModel = envManager.get('EMBEDDING_MODEL') || getDefaultModelForProvider(provider); + console.log(`[DEBUG] 🎯 ${provider} model selection: EMBEDDING_MODEL=${envManager.get('EMBEDDING_MODEL') || 'NOT SET'}, selected=${selectedModel}`); + return selectedModel; } } @@ -138,7 +140,7 @@ Environment Variables: Embedding Provider Configuration: EMBEDDING_PROVIDER Embedding provider: OpenAI, VoyageAI, Gemini, Ollama (default: OpenAI) - EMBEDDING_MODEL Embedding model name (auto-detected if not specified) + EMBEDDING_MODEL Embedding model name (works for all providers) Provider-specific API Keys: OPENAI_API_KEY OpenAI API key (required for OpenAI provider) @@ -148,7 +150,7 @@ Environment Variables: Ollama Configuration: OLLAMA_HOST Ollama server host (default: http://127.0.0.1:11434) - OLLAMA_MODEL Ollama model name (default: nomic-embed-text) + OLLAMA_MODEL Ollama model name (alternative to EMBEDDING_MODEL for Ollama) Vector Database Configuration: MILVUS_ADDRESS Milvus address (optional, can be auto-resolved from token) @@ -158,16 +160,19 @@ Examples: # Start MCP server with OpenAI (default) and explicit Milvus address OPENAI_API_KEY=sk-xxx MILVUS_ADDRESS=localhost:19530 npx @zilliz/claude-context-mcp@latest - # Start MCP server with OpenAI and auto-resolve Milvus address from token - OPENAI_API_KEY=sk-xxx MILVUS_TOKEN=your-zilliz-token npx @zilliz/claude-context-mcp@latest + # Start MCP server with OpenAI and specific model + OPENAI_API_KEY=sk-xxx EMBEDDING_MODEL=text-embedding-3-large MILVUS_TOKEN=your-token npx @zilliz/claude-context-mcp@latest - # Start MCP server with VoyageAI - EMBEDDING_PROVIDER=VoyageAI VOYAGEAI_API_KEY=pa-xxx MILVUS_TOKEN=your-token npx @zilliz/claude-context-mcp@latest + # Start MCP server with VoyageAI and specific model + EMBEDDING_PROVIDER=VoyageAI VOYAGEAI_API_KEY=pa-xxx EMBEDDING_MODEL=voyage-3-large MILVUS_TOKEN=your-token npx @zilliz/claude-context-mcp@latest - # Start MCP server with Gemini - EMBEDDING_PROVIDER=Gemini GEMINI_API_KEY=xxx MILVUS_TOKEN=your-token npx @zilliz/claude-context-mcp@latest + # Start MCP server with Gemini and specific model + EMBEDDING_PROVIDER=Gemini GEMINI_API_KEY=xxx EMBEDDING_MODEL=gemini-embedding-001 MILVUS_TOKEN=your-token npx @zilliz/claude-context-mcp@latest - # Start MCP server with Ollama + # Start MCP server with Ollama and specific model (using OLLAMA_MODEL) + EMBEDDING_PROVIDER=Ollama OLLAMA_MODEL=mxbai-embed-large MILVUS_TOKEN=your-token npx @zilliz/claude-context-mcp@latest + + # Start MCP server with Ollama and specific model (using EMBEDDING_MODEL) EMBEDDING_PROVIDER=Ollama EMBEDDING_MODEL=nomic-embed-text MILVUS_TOKEN=your-token npx @zilliz/claude-context-mcp@latest `); } \ No newline at end of file