Agent skill
mcp-server-evaluations
Evaluate MCP servers for quality and reliability. Verify tool functionality, test error handling, generate tests, and assess response quality with no dependencies other than curl. Use this when validating MCP server implementations, testing OpenAPI-to-MCP conversions, or assessing API tool quality.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/mcp-server-evaluations-skills
Metadata
Additional technical details for this skill
- author
- mcp.com.ai agent-skills
- version
- 0.1.0
SKILL.md
MCP Server Evaluations Skill
Systematically evaluate MCP servers to ensure they function correctly, handle errors gracefully, and meet quality standards.
Workflow
Phase 1: Environment Verification
- Verify MCP server is running
bash
curl -s http://localhost:3030/health # Expected: 200 OK curl -s -X POST http://localhost:3030/mcp \ -H "Content-Type: application/json" \ -d '{"jsonrpc":"2.0","id":1,"method":"ping"}' # Expected: {"jsonrpc":"2.0","id":1,"result":{}}
Phase 2: Tool Discovery
-
List all available tools
bashcurl -X POST http://localhost:3030/mcp \ -H "Content-Type: application/json" \ -d '{"jsonrpc":"2.0","method":"tools/list","id":1}' -
Verify tool completeness
- All OpenAPI operations exposed as tools
- Tool names follow consistent convention (e.g.,
getUsers,createOrder) - Descriptions are clear and actionable
- Required vs optional parameters clearly marked
- Parameter types match OpenAPI schema
-
Document discovered tools — Create inventory of tools for systematic testing.
Phase 3: Functional Testing
For each discovered tool:
-
Basic functionality test
bashcurl -X POST http://localhost:3030/mcp \ -H "Content-Type: application/json" \ -d '{ "jsonrpc": "2.0", "method": "tools/call", "params": { "name": "<tool_name>", "arguments": { <valid_arguments> } }, "id": 2 }' -
Verify response structure
- Response contains expected data
- Data types match schema
- No unexpected null values
- Pagination works (if applicable)
-
Error handling test — Call with invalid/missing arguments:
bashcurl -X POST http://localhost:3030/mcp \ -H "Content-Type: application/json" \ -d '{ "jsonrpc": "2.0", "method": "tools/call", "params": { "name": "<tool_name>", "arguments": {} }, "id": 3 }' -
Verify error response quality
- Error message is actionable
- Missing required parameters identified
- HTTP status codes propagated correctly
Phase 4: Question-Based Evaluation
Generate and test with realistic user questions:
-
Generate 10+ test questions covering:
- Simple single-tool queries
- Multi-step workflows requiring multiple tools
- Edge cases (empty results, large datasets)
- Error scenarios (invalid IDs, unauthorized access)
-
Execute each question through MCP client or Inspector
-
Score responses using evaluation criteria:
- Correctness: Does the answer match expected result?
- Completeness: Is all relevant information included?
- Clarity: Is the response well-structured?
- Performance: Response time within acceptable limits?
Phase 5: Quality Scoring
Calculate overall quality score:
| Category | Weight | Criteria |
|---|---|---|
| Tool Discovery | 20% | All operations exposed, proper naming |
| Basic Functionality | 30% | Valid inputs return correct responses |
| Error Handling | 20% | Graceful errors with actionable messages |
| Question Accuracy | 20% | Test questions answered correctly |
| Performance | 10% | Response times < 5s for standard ops |
Pass threshold: 80% overall score
Quick Evaluation Checklist
Run this minimal check for fast validation:
# 1. Health check
curl -s http://localhost:3030/health | grep -q "" && echo "✓ Health OK" || echo "✗ Health FAILED"
# 2. MCP ping
curl -s -X POST http://localhost:3030/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"ping"}' | jq -e '.jsonrpc == "2.0" and .result' > /dev/null && echo "✓ Ping OK" || echo "✗ Ping FAILED"
# 3. Tools list
curl -s -X POST http://localhost:3030/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"tools/list","id":1}' | jq '.result.tools | length' | xargs -I {} echo "✓ {} tools discovered"
# 4. Sample tool call (adjust tool name and args)
curl -s -X POST http://localhost:3030/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"listPets","arguments":{}},"id":2}' | jq '.result' > /dev/null && echo "✓ Tool call OK" || echo "✗ Tool call FAILED"
Test Question Templates
Use these patterns to generate effective test questions:
- List/Query: "Show me all [resources] that match [criteria]"
- Get Details: "What are the details of [resource] with ID [id]?"
- Create: "Create a new [resource] with [properties]"
- Update: "Update [resource] [id] to change [field] to [value]"
- Delete: "Remove [resource] with ID [id]"
- Aggregate: "How many [resources] exist with [status]?"
- Search: "Find [resources] where [field] contains [term]"
- Workflow: "Create a [resource], then update it, then list all"
References
For detailed documentation:
- references/mcp-inspector-guide.md — Inspector setup & usage
- references/evaluation-criteria.md — Quality metrics & scoring
- references/question-templates.md — Test question generation
Example: Petstore API Evaluation
# 1. Run health checks
curl -s http://localhost:3030/health
curl -s -X POST http://localhost:3030/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"ping"}' | jq -e '.jsonrpc == "2.0" and .result' > /dev/null && echo "✓ Ping OK" || echo "✗ Ping FAILED"
# 2. Tool discovery
curl -s -X POST http://localhost:3030/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"tools/list","id":1}' | jq '.result.tools'
# 3. Test questions:
# - "List all available pets"
# - "Show details of pet with ID 1"
# - "Find pets with status 'available'"
# - "Create a new pet named 'Fluffy'"
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?