Let's say you're testing PayPal MCP. You can write a test case prompt "Create a refund order for order 412". The test will run the prompt and check if the right PayPal tool was called.
The CLI helps with: 1. Test different prompts and observe how LLMs interact with your MCP server. The CLI shows a trace of the conversation. 2. Examine your server's tool name / description quality. See where LLMs are hallucinating using your server. 3. Analyze your MCP server's performance, like token consumption, and performance with different models. 4. Benchmarking your MCP server's performance to catch future regressions.
The nice thing about CLI is that you can run these tests iteratively! Please give it a try, and would really appreciate your feedback.
❤
npm
1.1.6 • Public • Published 4 hours ago
We built a CLI that performs MCP evals and End to End (E2E) testing. The CLI creates a simulated end user’s environment and tests popular user flows.
Evals helps you:
npm install -g @mcpjam/cli
To set up, create a new folder directory for your test. In that directory, create three files:
environment.json
to set up your MCP server connectionstests.json
to configure your testsllms.json
to store your LLM API keysThis file is configured very similar to a mcp.json
file. For servers with OAuth, you must provide your own Bearer
API tokens. MCPJam CLI does not handle OAuth flows / DCR. For bearer tokens, make sure to wrap your header with requestInit
.
{ "servers": { "asana": { "url": "https://mcp.asana.com/sse", "requestInit": { "headers": { "Authorization": "Bearer <ASANA_API_KEY>" } } }, "sequential-thinking": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-sequential-thinking"], "env": { "ENV_1": "<ENV_1>" } } } }
The test file is an array of tests.
[ { "title": "Workspace test", "query": "What is my asana workspace?", "runs": 1, // Number of times to run this test "model": "anthropic/claude-3.7-sonnet", "provider": "openrouter", // Provider name: "anthropic" | "openai" | "openrouter" "expectedToolCalls": ["asana_list_workspaces"] }, { "title": "Workspace users test", "query": "Can you figure out who is in the workspace?", "runs": 1, "model": "anthropic/claude-3.7-sonnet", "provider": "openrouter", "expectedToolCalls": ["asana_list_workspaces", "asana_get_workspace_users"] } ]
{ "anthropic": "<ANTHROPIC_API_KEY>", "openai": "<OPENAI_API_KEY>", "openrouter": "<OPENROUTER_API_KEY>" }
mcpjam evals run --tests tests.json --environment environment.json --llms llms.json
mcpjam evals run -t tests.json -e environment.json -l llms.json
--tests, -t <file>
: Path to the tests configuration file (required)--environment, -e <file>
: Path to the environment configuration file (required)--llms, -l <file>
: Path to the LLM API key configuration file--help, -h
: Show help information--version, -V
: Display version numbernone
npm i @mcpjam/cli
320
1.1.6
Apache-2.0
636 kB
7
4 hours ago