Agent skills
mcp-server-evaluations

Agent skill

mcp-server-evaluations

Evaluate MCP servers for quality and reliability. Verify tool functionality, test error handling, generate tests, and assess response quality with no dependencies other than curl. Use this when validating MCP server implementations, testing OpenAPI-to-MCP conversions, or assessing API tool quality.

View SKILL.md on GitHub Repository

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/mcp-server-evaluations-skills

Metadata

Additional technical details for this skill

author: mcp.com.ai agent-skills
version: 0.1.0

SKILL.md

MCP Server Evaluations Skill

Systematically evaluate MCP servers to ensure they function correctly, handle errors gracefully, and meet quality standards.

Workflow

Phase 1: Environment Verification

Verify MCP server is running

bash

curl -s http://localhost:3030/health
# Expected: 200 OK

curl -s -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"ping"}'
# Expected: {"jsonrpc":"2.0","id":1,"result":{}}

Phase 2: Tool Discovery

List all available tools

bash

curl -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"tools/list","id":1}'

Verify tool completeness
- All OpenAPI operations exposed as tools
- Tool names follow consistent convention (e.g., getUsers, createOrder)
- Descriptions are clear and actionable
- Required vs optional parameters clearly marked
- Parameter types match OpenAPI schema
Document discovered tools — Create inventory of tools for systematic testing.

Phase 3: Functional Testing

For each discovered tool:

Basic functionality test

bash

curl -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "<tool_name>",
      "arguments": { <valid_arguments> }
    },
    "id": 2
  }'

Verify response structure
- Response contains expected data
- Data types match schema
- No unexpected null values
- Pagination works (if applicable)

Error handling test — Call with invalid/missing arguments:

bash

curl -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "<tool_name>",
      "arguments": {}
    },
    "id": 3
  }'

Verify error response quality
- Error message is actionable
- Missing required parameters identified
- HTTP status codes propagated correctly

Phase 4: Question-Based Evaluation

Generate and test with realistic user questions:

Generate 10+ test questions covering:
- Simple single-tool queries
- Multi-step workflows requiring multiple tools
- Edge cases (empty results, large datasets)
- Error scenarios (invalid IDs, unauthorized access)
Execute each question through MCP client or Inspector
Score responses using evaluation criteria:
- Correctness: Does the answer match expected result?
- Completeness: Is all relevant information included?
- Clarity: Is the response well-structured?
- Performance: Response time within acceptable limits?

Phase 5: Quality Scoring

Calculate overall quality score:

Category	Weight	Criteria
Tool Discovery	20%	All operations exposed, proper naming
Basic Functionality	30%	Valid inputs return correct responses
Error Handling	20%	Graceful errors with actionable messages
Question Accuracy	20%	Test questions answered correctly
Performance	10%	Response times < 5s for standard ops

Pass threshold: 80% overall score

Quick Evaluation Checklist

Run this minimal check for fast validation:

bash

# 1. Health check
curl -s http://localhost:3030/health | grep -q "" && echo "✓ Health OK" || echo "✗ Health FAILED"

# 2. MCP ping
curl -s -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"ping"}' | jq -e '.jsonrpc == "2.0" and .result' > /dev/null && echo "✓ Ping OK" || echo "✗ Ping FAILED"

# 3. Tools list
curl -s -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"tools/list","id":1}' | jq '.result.tools | length' | xargs -I {} echo "✓ {} tools discovered"

# 4. Sample tool call (adjust tool name and args)
curl -s -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"listPets","arguments":{}},"id":2}' | jq '.result' > /dev/null && echo "✓ Tool call OK" || echo "✗ Tool call FAILED"

Test Question Templates

Use these patterns to generate effective test questions:

List/Query: "Show me all [resources] that match [criteria]"
Get Details: "What are the details of [resource] with ID [id]?"
Create: "Create a new [resource] with [properties]"
Update: "Update [resource] [id] to change [field] to [value]"
Delete: "Remove [resource] with ID [id]"
Aggregate: "How many [resources] exist with [status]?"
Search: "Find [resources] where [field] contains [term]"
Workflow: "Create a [resource], then update it, then list all"

References

For detailed documentation:

references/mcp-inspector-guide.md — Inspector setup & usage
references/evaluation-criteria.md — Quality metrics & scoring
references/question-templates.md — Test question generation

Example: Petstore API Evaluation

bash

# 1. Run health checks
curl -s http://localhost:3030/health
curl -s -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"ping"}' | jq -e '.jsonrpc == "2.0" and .result' > /dev/null && echo "✓ Ping OK" || echo "✗ Ping FAILED"

# 2. Tool discovery
curl -s -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"tools/list","id":1}' | jq '.result.tools'

# 3. Test questions:
# - "List all available pets"
# - "Show details of pet with ID 1"
# - "Find pets with status 'available'"
# - "Create a new pet named 'Fluffy'"

Maintainer

majiayu000 Core maintainer

Source details

Full Name: majiayu000/claude-skill-registry
Branch: main
Path in repo: skills/data/mcp-server-evaluations-skills
License: MIT License

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

Metadata

SKILL.md

MCP Server Evaluations Skill

Workflow

Phase 1: Environment Verification

Phase 2: Tool Discovery

Phase 3: Functional Testing

Phase 4: Question-Based Evaluation

Phase 5: Quality Scoring

Quick Evaluation Checklist

Test Question Templates

References

Example: Petstore API Evaluation

Recommended Agent Skills

agent-ops-spec

agent-ops-state

agent-ops-spec

agent-ops-testing

agent-ops-testing

agent-ops-state