I built a CLI to test and eval MCP servers

Hi folks, we've been working on a CLI tool to programatically test and eval MCP servers. Looking to get some initial feedback on the project.

Let's say you're testing PayPal MCP. You can write a test case prompt "Create a refund order for order 412". The test will run the prompt and check if the right PayPal tool was called.

The CLI helps with: 1. Test different prompts and observe how LLMs interact with your MCP server. The CLI shows a trace of the conversation. 2. Examine your server's tool name / description quality. See where LLMs are hallucinating using your server. 3. Analyze your MCP server's performance, like token consumption, and performance with different models. 4. Benchmarking your MCP server's performance to catch future regressions.

The nice thing about CLI is that you can run these tests iteratively! Please give it a try, and would really appreciate your feedback.

❤

npm

1.1.6 • Public • Published 4 hours ago

We built a CLI that performs MCP evals and End to End (E2E) testing. The CLI creates a simulated end user’s environment and tests popular user flows.

Evals helps you:

Discover workflows that are breaking your server and get actionable ways on resolving them.
Benchmark your server’s performance and catch regressions in future changes.
Programatically test queries on a MCP server with a command. No more doing QA one by one.

npm install -g @mcpjam/cli

To set up, create a new folder directory for your test. In that directory, create three files:

environment.json to set up your MCP server connections
tests.json to configure your tests
llms.json to store your LLM API keys

This file is configured very similar to a mcp.json file. For servers with OAuth, you must provide your own Bearer API tokens. MCPJam CLI does not handle OAuth flows / DCR. For bearer tokens, make sure to wrap your header with requestInit.

{ "servers": { "asana": { "url": "https://mcp.asana.com/sse", "requestInit": { "headers": { "Authorization": "Bearer <ASANA_API_KEY>" } } }, "sequential-thinking": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-sequential-thinking"], "env": { "ENV_1": "<ENV_1>" } } } }

The test file is an array of tests.

[ { "title": "Workspace test", "query": "What is my asana workspace?", "runs": 1, // Number of times to run this test "model": "anthropic/claude-3.7-sonnet", "provider": "openrouter", // Provider name: "anthropic" | "openai" | "openrouter" "expectedToolCalls": ["asana_list_workspaces"] }, { "title": "Workspace users test", "query": "Can you figure out who is in the workspace?", "runs": 1, "model": "anthropic/claude-3.7-sonnet", "provider": "openrouter", "expectedToolCalls": ["asana_list_workspaces", "asana_get_workspace_users"] } ]

{ "anthropic": "<ANTHROPIC_API_KEY>", "openai": "<OPENAI_API_KEY>", "openrouter": "<OPENROUTER_API_KEY>" }

mcpjam evals run --tests tests.json --environment environment.json --llms llms.json

mcpjam evals run -t tests.json -e environment.json -l llms.json

--tests, -t <file>: Path to the tests configuration file (required)
--environment, -e <file>: Path to the environment configuration file (required)
--llms, -l <file>: Path to the LLM API key configuration file
--help, -h: Show help information
--version, -V: Display version number

Readme

Keywords

none

Package Sidebar

Install

npm i @mcpjam/cli

Weekly Downloads

320

Version

1.1.6

License

Apache-2.0

Unpacked Size

636 kB

Total Files

Last publish

4 hours ago

Collaborators

Try on RunKit

Report malware

Hacker Times