A Comet ML Open Source Project
This Python toolbox contains four command-line easy to use utilities:
- ez-mcp-server- turns a file of Python functions into a MCP server
- ez-mcp-chatbot- interactively debug MCP servers, with traces logged to Opik
- ez-mcp-eval- evaluate LLM applications using Opik's evaluation framework
- ez-mcp-optimize- optimize LLM applications using Opik's optimization framework
The ez-mcp-server allows a quick way to examine tools, signatures, descriptions, latency, and return values. Combined with the chatbot, you can create a fast workflow to interate on your MCP tools.
The ez-mcp-chatbot allows a quick method to examine and debug LLM and MCP tool interactions, with observability available through Opik. Although the Opik Playground gives you the ability to test your prompts on datasets, do A/B testing, and more, this chatbot gives you a command-line interaction, debugging tools, combined with Opik observability.
The ez-mcp-eval and ez-mcp-optimize commands provide evaluation and optimization capabilities for your LLM applications, enabling you to measure performance and automatically improve your prompts using Opik's evaluation and optimization frameworks.
pip install ez-mcp-toolbox --upgrade
ez-mcp-chatbot
That will start a ez-mcp-server (using example tools below) and the ez-mcp-chatbot configured to use those tools.
ez-mcp-eval --prompt "Answer the question" --dataset "my-dataset" --metric "Hallucination"
This will evaluate your LLM application using Opik's evaluation framework with your dataset and chosen metrics.
You can also limit the evaluation to the first N items of the dataset:
ez-mcp-eval --prompt "Answer the question" --dataset "large-dataset" --metric "Hallucination" --num 100You can customize the chatbot's behavior with a custom system prompt:
# Use a custom system prompt
ez-mcp-chatbot --prompt "You are a helpful coding assistant"
# Create a default configuration
ez-mcp-chatbot --initExample dialog:
This interaction of the LLM with the MCP tools will be logged, and available for examination and debugging in Opik:
 
The rest of this file describes these three commands.
A command-line utility for turning a regular file of Python functions or classes into a full-fledged MCP server.
Take an existing Python file of functions, such as this file, my_tools.py:
# my_tools.py
def add_numbers(a: float, b: float) -> float:
    """
    Add two numbers together.
    Args:
        a: First number to add
        b: Second number to add
    Returns:
        The sum of a and b
    """
    return a + b
def greet_user(name: str) -> str:
    """
    Greet a user with a welcoming message.
    Args:
        name: The name of the person to greet
    Returns:
        A personalized greeting message
    """
    return f"Welcome to ez-mcp-server, {name}!"Then run the server with your custom tools:
ez-mcp-server my_tools.pyYou can also load tools from installed Python modules:
ez-mcp-server opik_optimizer.utils.coreOr download tools from a URL:
ez-mcp-server https://example.com/my_tools.pyThe server will automatically:
- Load all functions from your file or module (no ez_mcp_toolbox imports required)
- Convert them to MCP tools
- Generate JSON schemas from your function signatures
- Use your docstrings as tool descriptions
Note: if you just launch the server, it will wait for stdio input. This is designed to run from inside a system that will dynamically start the server (see below).
ez-mcp-server [-h] [--transport {stdio,sse}] [--host HOST] [--port PORT] [--include INCLUDE] [--exclude EXCLUDE] [tools_file]
Positional arguments:
- tools_file- Path to tools file, module name, URL to download from, or 'none' to disable tools (e.g., 'my_tools.py', 'opik_optimizer.utils.core', 'https://example.com/tools.py', or 'none') (default: DEMO)
Options:
- -h,- --help- show this help message and exit
- --transport {stdio,sse}- Transport method to use (default:- stdio)
- --host HOST- Host for SSE transport (default:- localhost)
- --port PORT- Port for SSE transport (default:- 8000)
- --include INCLUDE- Python regex pattern to include only matching tool names
- --exclude EXCLUDE- Python regex pattern to exclude matching tool names
You can control which tools are loaded using the --include and --exclude flags with Python regex patterns:
# Include only tools with "add" or "multiply" in the name
ez-mcp-server my_tools.py --include "add|multiply"
# Exclude tools with "greet" or "time" in the name
ez-mcp-server my_tools.py --exclude "greet|time"
# Use both filters together
ez-mcp-server my_tools.py --include ".*number.*" --exclude ".*square.*"
# Use with default tools
ez-mcp-server --include "add" --exclude "greet"Filtering Logic:
- The --includefilter is applied first, keeping only tools whose names match the regex pattern
- The --excludefilter is then applied, removing any tools whose names match the regex pattern
- Both filters can be used together for fine-grained control
- Invalid regex patterns will cause the server to exit with an error message
A powerful AI chatbot that integrates with Model Context Protocol (MCP) servers and provides observability through Opik tracing. This chatbot can connect to various MCP servers to access specialized tools and capabilities, making it a versatile assistant for different tasks.
- MCP Integration: Connect to multiple Model Context Protocol servers for specialized tool access
- Opik Observability: Built-in tracing and observability with Opik integration
- Interactive Chat Interface: Rich console interface with command history and auto-completion
- Python Code Execution: Execute Python code directly in the chat environment
- Tool Management: Discover and use tools from connected MCP servers
- Configurable: JSON-based configuration for models and MCP servers
- Async Support: Full asynchronous operation for better performance
The server implements the full MCP specification:
- Tool Discovery: Dynamic tool listing and metadata
- Tool Execution: Asynchronous tool calling with proper error handling
- Protocol Compliance: Full compatibility with MCP clients
- Extensibility: Easy addition of new tools and capabilities
Create a default configuration file:
ez-mcp-chatbot --initThis creates a ez-config.json file with default settings.
Edit ez-config.json to specify your model and MCP servers. For example:
{
  "model": "openai/gpt-4o-mini",
  "model_kwargs": {
    "temperature": 0.2
  },
  "mcp_servers": [
    {
      "name": "ez-mcp-server",
      "description": "Ez MCP server from Python files",
      "command": "ez-mcp-server",
      "args": ["/path/to/my_tools.py"]
    }
  ]
}Supported model formats:
- openai/gpt-4o-mini
- anthropic/claude-3-sonnet
- google/gemini-pro
- And many more through LiteLLM
Inside the ez-mcp-chatbot, you can have a normal LLM conversation.
In addition, you have access to the following meta-commands:
- /clear- Clear the conversation history
- /help- Show available commands
- /debug onor- /debug offto toggle debug output
- /show tools- to list all available tools
- /show tools SERVER- to list tools for a specific server
- /run SERVER.TOOL- to execute a tool
- ! python_code- to execute Python code (e.g., '! print(2+2)')
- quitor- exit- Exit the chatbot
Execute Python code by prefixing with !:
! print(self.messages)
! import math
! math.sqrt(16)
The chatbot automatically discovers and uses tools from connected MCP servers. Simply ask questions that require tool usage, and the chatbot will automatically call the appropriate tools.
The chatbot uses a system prompt to define its behavior and personality. You can customize this using the --prompt command line option, which supports:
- Direct strings: --prompt "You are a helpful assistant"
- File paths: --prompt ./my_prompt.txt
- Opik prompt names: --prompt my_optimized_prompt
By default, the chatbot uses this system prompt:
You are a helpful AI system for answering questions that can be answered
with any of the available tools.
You can override the default system prompt to customize the chatbot's behavior:
# Direct string prompts
ez-mcp-chatbot --prompt "You are an expert Python developer who helps with coding tasks."
ez-mcp-chatbot --prompt "You are a data scientist who specializes in analyzing datasets and creating visualizations."
ez-mcp-chatbot --prompt "You are a friendly AI assistant who loves to help users with their questions and tasks."
# Load prompt from file
ez-mcp-chatbot --prompt ./my_custom_prompt.txt
# Load prompt from Opik (if you have optimized prompts stored there)
ez-mcp-chatbot --prompt my_optimized_coding_assistantThe system prompt affects how the chatbot:
- Interprets user requests
- Decides which tools to use
- Structures its responses
- Maintains conversation context
The chatbot includes built-in Opik observability integration:
For the command-line flag --opik:
- hosted(default): Use hosted Opik service
- local: Use local Opik instance
- disabled: Disable Opik tracing
Set environment variables for Opik:
# For hosted mode
export OPIK_API_KEY=your_opik_api_key
# For local mode
export OPIK_LOCAL_URL=http://localhost:8080# Use hosted Opik (default)
ez-mcp-chatbot --opik hosted
# Use local Opik
ez-mcp-chatbot --opik local
# Disable Opik
ez-mcp-chatbot --opik disabled
# Use custom system prompt
ez-mcp-chatbot --prompt "You are a helpful coding assistant"
# Combine options
ez-mcp-chatbot --prompt "You are a data analysis expert" --opik local --debug
# Use custom tools file
ez-mcp-chatbot --tools-file "my_tools.py"
# Use tools file from URL
ez-mcp-chatbot --tools-file "https://example.com/my_tools.py"
# Override model arguments
ez-mcp-chatbot --model-args '{"temperature": 0.7, "max_tokens": 1000}'
# Override both model and model arguments
ez-mcp-chatbot --model "openai/gpt-4" --model-args '{"temperature": 0.3, "max_tokens": 2000}'- config_path- Path to the configuration file (default: ez-config.json)
- --opik {local,hosted,disabled}- Opik tracing mode (default: hosted)
- --init- Create a default ez-config.json file and exit
- --debug- Enable debug output during processing
- --prompt TEXT- Custom system prompt for the chatbot (overrides default)
- --model MODEL- Override the model specified in the config file
- --tools-file TOOLS_FILE- Path to a Python file containing tool definitions, or URL to download the file from. If provided, will create an MCP server configuration using this file.
- --model-args MODEL_ARGS- JSON string of additional keyword arguments to pass to the LLM model
A command-line utility for evaluating LLM applications using Opik's evaluation framework. This tool provides a simple interface to run evaluations on datasets with various metrics, enabling you to measure and improve your LLM application's performance.
- Dataset Evaluation: Run evaluations on your datasets using Opik's evaluation framework
- Multiple Metrics: Support for various evaluation metrics (Hallucination, LevenshteinRatio, etc.)
- Opik Integration: Full integration with Opik for observability and tracking
- Flexible Configuration: Customizable prompts, models, and evaluation parameters
- Rich Output: Beautiful console output with progress tracking and results display
ez-mcp-eval --prompt "Answer the question" --dataset "my-dataset" --metric "Hallucination"ez-mcp-eval [-h] [--prompt PROMPT] [--dataset DATASET] [--metric METRIC]
            [--metrics-file METRICS_FILE] [--experiment-name EXPERIMENT_NAME]
            [--opik {local,hosted,disabled}] [--debug] [--input INPUT]
            [--output OUTPUT] [--list-metrics] [--model MODEL]
            [--model-kwargs MODEL_KWARGS] [--config CONFIG] [--tools-file TOOLS_FILE]
            [--num NUM]
- --prompt PROMPT- The prompt to use for evaluation
- --dataset DATASET- Name of the dataset to evaluate on
- --metric METRIC- Name of the metric(s) to use for evaluation (comma-separated for multiple)
- --metrics-file METRICS_FILE- Path to a Python file containing metric definitions (alternative to using opik.evaluation.metrics)
- --experiment-name EXPERIMENT_NAME- Name for the evaluation experiment (default: ez-mcp-evaluation)
- --opik {local,hosted,disabled}- Opik tracing mode (default: hosted)
- --debug- Enable debug output
- --input INPUT- Input field name in the dataset (default: input)
- --output OUTPUT- Output field mapping in format reference=DATASET_FIELD (default: reference=answer)
- --list-metrics- List all available metrics and exit
- --model MODEL- LLM model to use for evaluation (default: gpt-3.5-turbo)
- --model-kwargs MODEL_KWARGS- JSON string of additional keyword arguments for the LLM model
- --config CONFIG- Path to MCP server configuration file (default: ez-config.json)
- --tools-file TOOLS_FILE- Path to a Python file containing tool definitions, or URL to download the file from. If provided, will create an MCP server configuration using this file.
- --num NUM- Number of items to evaluate from the dataset (takes first N items, default: all items)
The ez-mcp-eval command supports loading datasets from two sources:
- Opik datasets: If the dataset exists in your Opik account, it will be loaded directly
- opik_optimizer.datasets: If the dataset is not found in Opik, the tool will automatically check for a function with the same name in opik_optimizer.datasetsand create the dataset using that function
This allows you to use both pre-existing Opik datasets and dynamically generated datasets from the opik_optimizer package.
# Simple evaluation with Hallucination metric
ez-mcp-eval --prompt "Answer the question" --dataset "qa-dataset" --metric "Hallucination"# Evaluate with multiple metrics
ez-mcp-eval --prompt "Summarize this text" --dataset "summarization-dataset" --metric "Hallucination,LevenshteinRatio"# Use a custom experiment name
ez-mcp-eval --prompt "Translate to French" --dataset "translation-dataset" --metric "LevenshteinRatio" --experiment-name "french-translation-test"# Use a different model with custom parameters
ez-mcp-eval --prompt "Answer the question" --dataset "qa-dataset" --metric "LevenshteinRatio" --model "gpt-4" --model-kwargs '{"temperature": 0.7, "max_tokens": 1000}'# Use a dataset from opik_optimizer.datasets (automatically created if not in Opik)
ez-mcp-eval --prompt "Answer the question" --dataset "my_optimizer_dataset" --metric "Hallucination"# Custom input and output field mappings
ez-mcp-eval --prompt "Answer the question" --dataset "qa-dataset" --metric "LevenshteinRatio" --input "question" --output "reference=answer"The ez-mcp-eval command now includes automatic validation of input and output field mappings to prevent common configuration errors:
- What it checks: The --inputfield must exist in the dataset items
- When it runs: Before starting the evaluation
- Error handling: If the field doesn't exist, the command stops with a clear error message showing available fields
- What it checks:
- The --outputVALUE (dataset field) must exist in the dataset items
- The --outputKEY (metric parameter) must be a valid parameter for the selected metric(s) score method
 
- The 
- When it runs: Before starting the evaluation
- Error handling: If validation fails, the command stops with clear error messages
# Input field not found in dataset
❌ Input field 'question' not found in dataset items
   Available fields: input, answer
# Output field not found in dataset
❌ Reference field 'response' not found in dataset items
   Available fields: input, answer
# Invalid metric parameter
❌ Output reference 'reference' is not a valid parameter for metric 'LevenshteinRatio' score method
   Available parameters: output, referenceThis validation helps catch configuration errors early, saving time and preventing failed evaluations.
# Use custom metrics defined in a Python file
ez-mcp-eval --prompt "Answer the question" --dataset "qa-dataset" --metric "CustomMetric" --metrics-file "my_metrics.py"# Use a custom tools file for MCP server configuration
ez-mcp-eval --prompt "Answer the question" --dataset "qa-dataset" --metric "LevenshteinRatio" --tools-file "my_tools.py"
# Use tools file from URL
ez-mcp-eval --prompt "Answer the question" --dataset "qa-dataset" --metric "LevenshteinRatio" --tools-file "https://example.com/my_tools.py"# See all available metrics
ez-mcp-eval --list-metrics# Enable debug output for troubleshooting
ez-mcp-eval --prompt "Answer the question" --dataset "qa-dataset" --metric "Hallucination" --debugYou can define custom metrics in a Python file and use them with the --metrics-file option. The metric file should contain metric classes that follow the same interface as Opik's built-in metrics.
class CustomMetric:
    def __init__(self):
        self.name = "CustomMetric"
    def __call__(self, output, reference):
        # Your custom evaluation logic here
        # Return a score between 0 and 1
        return 0.8  # Example scoreThen use it with:
ez-mcp-eval --prompt "Answer the question" --dataset "qa-dataset" --metric "CustomMetric" --metrics-file "my_metrics.py"The ez-mcp-eval tool integrates seamlessly with Opik for:
- Dataset Management: Load datasets from your Opik workspace
- Prompt Management: Use prompts stored in Opik or provide direct text
- Experiment Tracking: Track evaluation experiments with custom names
- Observability: Full tracing of LLM calls and evaluation processes
For Opik integration, set up your environment:
# For hosted Opik
export OPIK_API_KEY=your_opik_api_key
# For local Opik
export OPIK_LOCAL_URL=http://localhost:8080The tool supports all metrics available in Opik's evaluation framework. Use --list-metrics to see the complete list, which includes:
- Hallucination: Detect hallucinated content in responses
- LevenshteinRatio: Measure text similarity using Levenshtein distance
- ExactMatch: Check for exact string matches
- F1Score: Calculate F1 score for classification tasks
- And many more...
The tool provides rich console output including:
- Progress tracking during evaluation
- Dataset information and statistics
- Evaluation results and metrics
- Error handling and debugging information
- Integration with Opik's experiment tracking
A command-line utility for optimizing LLM applications using Opik's optimization framework. This tool provides a simple interface to run prompt optimization on datasets with various metrics and optimizers, enabling you to improve your LLM application's performance through automated optimization.
- Prompt Optimization: Run optimization on your prompts using Opik's optimization framework
- Multiple Optimizers: Support for various optimization algorithms (EvolutionaryOptimizer, FewShotBayesianOptimizer, etc.)
- Opik Integration: Full integration with Opik for observability and tracking
- Flexible Configuration: Customizable prompts, models, and optimization parameters
- Rich Output: Beautiful console output with progress tracking and results display
ez-mcp-optimize --prompt "Answer the question" --dataset "my-dataset" --metric "Hallucination"ez-mcp-optimize [-h] [--prompt PROMPT] [--dataset DATASET] [--metric METRIC]
                [--metrics-file METRICS_FILE] [--experiment-name EXPERIMENT_NAME]
                [--opik {local,hosted,disabled}] [--debug] [--input INPUT]
                [--output OUTPUT] [--list-metrics] [--model MODEL]
                [--model-kwargs MODEL_KWARGS] [--config CONFIG] [--tools-file TOOLS_FILE]
                [--num NUM] [--optimizer OPTIMIZER] [--class-kwargs CLASS_KWARGS]
                [--optimize-kwargs OPTIMIZE_KWARGS]
- --prompt PROMPT- The prompt to use for optimization
- --dataset DATASET- Name of the dataset to optimize on
- --metric METRIC- Name of the metric(s) to use for optimization (comma-separated for multiple)
- --metrics-file METRICS_FILE- Path to a Python file containing metric definitions (alternative to using opik.evaluation.metrics)
- --experiment-name EXPERIMENT_NAME- Name for the optimization experiment (default: ez-mcp-optimization)
- --opik {local,hosted,disabled}- Opik tracing mode (default: hosted)
- --debug- Enable debug output
- --input INPUT- Input field name in the dataset (default: input)
- --output OUTPUT- Output field mapping. Accepts 'REFERENCE=FIELD', 'REFERENCE:FIELD', or just 'FIELD'. If only FIELD is provided, it will be used as the ChatPrompt user field. (default: reference=answer)
- --list-metrics- List all available metrics and exit
- --model MODEL- LLM model to use for optimization (default: gpt-3.5-turbo)
- --model-kwargs MODEL_KWARGS- JSON string of additional keyword arguments for the LLM model
- --config CONFIG- Path to MCP server configuration file (default: ez-config.json)
- --tools-file TOOLS_FILE- Path to a Python file containing tool definitions, or URL to download the file from. If provided, will create an MCP server configuration using this file.
- --num NUM- Number of items to optimize from the dataset (takes first N items, default: all items)
- --optimizer OPTIMIZER- Optimizer class to use for optimization (default: EvolutionaryOptimizer)
- --class-kwargs CLASS_KWARGS- JSON string of keyword arguments to pass to the optimizer constructor
- --optimize-kwargs OPTIMIZE_KWARGS- JSON string of keyword arguments to pass to the optimize_prompt() method
The tool supports various optimization algorithms:
- EvolutionaryOptimizer (default): Genetic algorithm-based optimization
- FewShotBayesianOptimizer: Bayesian optimization with few-shot examples
- MetaPromptOptimizer: Meta-learning based optimization
- GepaOptimizer: Gradient-based optimization
- MiproOptimizer: Multi-objective optimization
# Simple optimization with Hallucination metric
ez-mcp-optimize --prompt "Answer the question" --dataset "qa-dataset" --metric "Hallucination"# Optimize with multiple metrics
ez-mcp-optimize --prompt "Summarize this text" --dataset "summarization-dataset" --metric "Hallucination,LevenshteinRatio"# Use a different optimizer
ez-mcp-optimize --prompt "Answer the question" --dataset "qa-dataset" --metric "LevenshteinRatio" --optimizer "FewShotBayesianOptimizer"# Use custom optimizer parameters
ez-mcp-optimize --prompt "Answer the question" --dataset "qa-dataset" --metric "LevenshteinRatio" --class-kwargs '{"population_size": 50, "mutation_rate": 0.1}'# Use custom optimization parameters
ez-mcp-optimize --prompt "Answer the question" --dataset "qa-dataset" --metric "LevenshteinRatio" --optimize-kwargs '{"auto_continue": true, "n_samples": 100}'The ez-mcp-optimize tool integrates seamlessly with Opik for:
- Dataset Management: Load datasets from your Opik workspace
- Prompt Management: Use prompts stored in Opik or provide direct text
- Experiment Tracking: Track optimization experiments with custom names
- Observability: Full tracing of LLM calls and optimization processes
For Opik integration, set up your environment:
# For hosted Opik
export OPIK_API_KEY=your_opik_api_key
# For local Opik
export OPIK_LOCAL_URL=http://localhost:8080The tool provides rich console output including:
- Progress tracking during optimization
- Dataset information and statistics
- Optimization results and metrics
- Error handling and debugging information
- Integration with Opik's experiment tracking
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Documentation: GitHub Repository
- Issues: GitHub Issues
- Built with Model Context Protocol (MCP)
- Powered by LiteLLM
- Observability by Opik
- Rich console interface by Rich
- Fork the repository
- Create a feature branch: git checkout -b feature-name
- Make your changes
- Run tests: pytest
- Format code: black . && isort .
- Commit your changes: git commit -m "Add feature"
- Push to the branch: git push origin feature-name
- Submit a pull request
- Python 3.8 or higher
- OpenAI, Anthropic, or other LLM provider API key (for chatbot functionality)
# Clone the repository
git clone https://github.com/comet-ml/ez-mcp-toolbox.git
cd ez-mcp-toolbox
# Install in development mode
pip install -e .
# Or install with development dependencies
pip install -e ".[dev]"pip install -r requirements.txt