Agent skills
troubleshooting-config-item

Agent skill

troubleshooting-config-item

Troubleshoots infrastructure and application configuration items in Mission Control by diagnosing health issues, analyzing recent changes, and investigating resource relationships. Use when users ask about unhealthy or failing resources, mention specific config items by name or ID, inquire about Kubernetes pods/deployments/services, AWS EC2 instances/volumes, Azure VMs, or other infrastructure components. Also use when investigating why a resource is down, stopped, degraded, or showing errors, or when analyzing what changed that caused an issue.

View SKILL.md on GitHub Repository

Stars 0

Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/flanksource/claude-code-plugin/tree/main/skills/troubleshooting-config-item

SKILL.md

Config Item Troubleshooting Skill

Core Purpose

This skill enables Claude to troubleshoot infrastructure and application configuration items in Mission Control, diagnose health issues, analyze changes, and identify root causes through systematic investigation of config relationships and history.

Understanding Config Items

A ConfigItem represents a discoverable infrastructure or application configuration (Kubernetes Pods, AWS EC2 instances, Azure VMs, database instances, etc.). Each config item contains:

health: Overall health status ("healthy", "unhealthy", "warning", "unknown")
status: Operational state (e.g., "Running", "Stopped", "Pending")
description: Human-readable description (often contains error messages when unhealthy)
.config: The actual JSON specification/manifest (e.g., Kubernetes Pod spec, AWS instance details)
type: The kind of resource (e.g., "Kubernetes::Pod", "AWS::EC2::Instance")
tags: Metadata for filtering and organization
parent_id/path: Hierarchical relationships to other configs
external_id: External system identifier

Key Workflows

Initial Investigation

1. Search and Identify the Config Use the MCP search_catalog tool to find the config item:

Search by id, name, type, tags, or other attributes
Narrow down to the specific config experiencing issues

2. Get Complete Config Details Use the MCP describe_catalog tool to retrieve full config information:

Review the health field for overall status
Check the status field for operational state
Read the description field carefully - this often contains error messages or status information
Examine the .config JSON field - this contains the full specification/manifest

Change Analysis

3. Review Recent Changes If the issue isn't immediately apparent, use the MCP search_catalog_changes tool:

Get changes for the specific config item
Look for recent modifications to the specification
Check change_type (created, updated, deleted)
Review severity (critical, high, medium, low, info)
Examine patches and diff fields to see what changed
Check source to understand where the change originated
Note the created_at timestamp to correlate with when issues started

Relationship Navigation

4. Investigate Related Configs Use the MCP get_related_configs tool to navigate the config hierarchy:

Children: Resources created/managed by this config
- Example: A Kubernetes Deployment → ReplicaSets → Pods
- Example: An AWS Auto Scaling Group → EC2 Instances
Parents: Resources that manage this config
- Example: A Pod → ReplicaSet → Deployment
Dependencies: Resources this config depends on
- Example: A Pod → ConfigMaps, Secrets, PersistentVolumeClaims

Troubleshooting Pattern: When a parent resource is unhealthy, investigate its children to find the actual failing component. When a child is unhealthy, check the parent for misconfigurations.

Critical Requirements

Hierarchical Thinking:

Kubernetes: Namespace → Deployment → ReplicaSet → Pod → Container
AWS: VPC → Subnet → EC2 Instance → Volume
Azure: Resource Group → VM → Disk

Change Impact Analysis:

Compare current config with previous working state
Identify what changed and when
Correlate timing of changes with health degradation

Evidence-Based Diagnosis:

Support conclusions with specific evidence from the config data
Quote relevant error messages from description fields
Reference specific fields in the .config JSON
Cite change diffs and timestamps

Diagnosis Workflow

Follow this systematic approach:

Identify - Find the config item
Assess - Review health, status, description, and .config spec
Analyze Changes - Check recent modifications and events
Navigate Relationships - Investigate parent/child/dependency configs
Review Analysis - Check automated findings
Synthesize - Determine root cause from all evidence
Recommend - Provide specific remediation steps

Example Troubleshooting Scenarios

Scenario 1: Unhealthy Kubernetes Deployment

Get Deployment details → health: unhealthy
Get related configs (children) → ReplicaSets → Pods
Find Pod in CrashLoopBackOff
Check Pod .config → image pull error
Check changes → recent image tag update
Root cause: Invalid image tag deployed
Recommendation: Rollback to previous image or fix image tag

Scenario 2: AWS EC2 Instance Issues

Get Instance details → status: stopped, health: unhealthy
Check description → "InsufficientInstanceCapacity"
Review changes → instance type changed to unavailable type
Get related configs → Security Groups, Volumes
Root cause: Requested instance type not available in AZ
Recommendation: Change to available instance type or different AZ

Maintainer

flanksource Core maintainer

Source details

Full Name: flanksource/claude-code-plugin
Branch: main
Path in repo: skills/troubleshooting-config-item

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

flanksource/claude-code-plugin

write-canary-transformations

Write correct transform blocks for Mission Control canary checks including fan-out, inline, and generated canary patterns. Use when adding transformations to canary checks, splitting a single check into multiple results, modifying check output, or generating child canaries from discovered resources.

0 0

Explore

flanksource/claude-code-plugin

troubleshooting-health-checks

Debugs and troubleshoots Mission Control health checks by analyzing check configurations, reviewing failure patterns, and identifying root causes. Use when users ask about failing health checks, mention specific health check names or IDs, inquire why a health check is failing or unhealthy, or need help understanding health check errors and timeouts.

0 0

Explore

flanksource/claude-code-plugin

write-canary-tests

Write correct test blocks and assertions for Mission Control canary health checks. Use when creating canaries that need pass/fail conditions, adding test expressions, or writing assertions based on HTTP status, JSON response, exec output, or Kubernetes health.

0 0

Explore

flanksource/claude-code-plugin

troubleshooting-notifications

Investigates Mission Control notifications to identify root causes and provide remediation. Use when users mention notification IDs, ask about alerts or notifications, request help understanding "why did I get this notification", want to troubleshoot a specific alert, or ask about notification patterns and history. This skill retrieves notification details, analyzes historical patterns, routes to resource-specific troubleshooting (config items or health checks), correlates findings, and delivers actionable remediation steps with prevention recommendations.

0 0

Explore

flanksource/claude-code-plugin

promotion-eval-create

Create a promotion evaluation template for any system by gathering requirements through structured questions and generating a reusable evaluation skill. Use when users ask to create a promotion check, release readiness evaluation, environment health template, or want to build a custom evaluation workflow for systems beyond Mission Control.

0 0

Explore

flanksource/claude-code-plugin

promotion-eval-mission-control

Evaluates a Mission Control environment's platform health for release or promotion readiness. Checks health check pipelines, config scrapers, background jobs, notifications, event queues, and MC infrastructure. Use for pre-release checks, environment promotion, or environment status. Triggers: "check environment health", "is it ready for release", "pre-release health check", "evaluate environment", "promotion readiness", "environment status"

0 0

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Config Item Troubleshooting Skill

Core Purpose

Understanding Config Items

Key Workflows

Initial Investigation

Change Analysis

Relationship Navigation

Critical Requirements

Diagnosis Workflow

Example Troubleshooting Scenarios

Recommended Agent Skills

write-canary-transformations

troubleshooting-health-checks

write-canary-tests

troubleshooting-notifications

promotion-eval-create

promotion-eval-mission-control