Agent skill
data-analysis
Analyze datasets to extract insights, identify patterns, and generate reports. Use when exploring data, creating visualizations, or performing statistical analysis. Handles CSV, JSON, SQL queries, and Python pandas operations.
Install this agent skill to your Project
npx add-skill https://github.com/autohandai/community-skills/tree/main/data-analysis
Metadata
Additional technical details for this skill
- tags
- data, analysis, pandas, statistics, visualization, csv, sql
- platforms
- Claude, ChatGPT, Gemini
SKILL.md
Data Analysis
When to use this skill
- Data exploration: Understand a new dataset
- Report generation: Derive data-driven insights
- Quality validation: Check data consistency
- Decision support: Make data-driven recommendations
Instructions
Step 1: Load and explore data
Python (Pandas):
import pandas as pd
import numpy as np
# Load CSV
df = pd.read_csv('data.csv')
# Basic info
print(df.info())
print(df.describe())
print(df.head(10))
# Check missing values
print(df.isnull().sum())
# Data types
print(df.dtypes)
SQL:
-- Inspect table schema
DESCRIBE table_name;
-- Sample data
SELECT * FROM table_name LIMIT 10;
-- Basic stats
SELECT
COUNT(*) as total_rows,
COUNT(DISTINCT column_name) as unique_values,
MIN(numeric_column) as min_val,
MAX(numeric_column) as max_val,
AVG(numeric_column) as avg_val
FROM table_name;
Step 2: Data cleaning
# Handle missing values
df['column'].fillna(df['column'].mean(), inplace=True)
df.dropna(subset=['required_column'], inplace=True)
# Remove duplicates
df.drop_duplicates(inplace=True)
# Type conversions
df['date'] = pd.to_datetime(df['date'])
df['category'] = df['category'].astype('category')
# Remove outliers (IQR method)
Q1 = df['value'].quantile(0.25)
Q3 = df['value'].quantile(0.75)
IQR = Q3 - Q1
df = df[(df['value'] >= Q1 - 1.5*IQR) & (df['value'] <= Q3 + 1.5*IQR)]
Step 3: Statistical analysis
# Descriptive statistics
print(df['numeric_column'].describe())
# Grouped analysis
grouped = df.groupby('category').agg({
'value': ['mean', 'sum', 'count'],
'other': 'nunique'
})
print(grouped)
# Correlation
correlation = df[['col1', 'col2', 'col3']].corr()
print(correlation)
# Pivot table
pivot = pd.pivot_table(df,
values='sales',
index='region',
columns='month',
aggfunc='sum'
)
Step 4: Visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Histogram
plt.figure(figsize=(10, 6))
df['value'].hist(bins=30)
plt.title('Distribution of Values')
plt.savefig('histogram.png')
# Boxplot
plt.figure(figsize=(10, 6))
sns.boxplot(x='category', y='value', data=df)
plt.title('Value by Category')
plt.savefig('boxplot.png')
# Heatmap (correlation)
plt.figure(figsize=(10, 8))
sns.heatmap(correlation, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.savefig('heatmap.png')
# Time series
plt.figure(figsize=(12, 6))
df.groupby('date')['value'].sum().plot()
plt.title('Time Series of Values')
plt.savefig('timeseries.png')
Step 5: Derive insights
# Top/bottom analysis
top_10 = df.nlargest(10, 'value')
bottom_10 = df.nsmallest(10, 'value')
# Trend analysis
df['month'] = df['date'].dt.to_period('M')
monthly_trend = df.groupby('month')['value'].sum()
growth = monthly_trend.pct_change() * 100
# Segment analysis
segments = df.groupby('segment').agg({
'revenue': 'sum',
'customers': 'nunique',
'orders': 'count'
})
segments['avg_order_value'] = segments['revenue'] / segments['orders']
Output format
Analysis report structure
# Data Analysis Report
## 1. Dataset overview
- Dataset: [name]
- Records: X,XXX
- Columns: XX
- Date range: YYYY-MM-DD ~ YYYY-MM-DD
## 2. Key findings
- Insight 1
- Insight 2
- Insight 3
## 3. Statistical summary
| Metric | Value |
|------|-----|
| Mean | X.XX |
| Median | X.XX |
| Std dev | X.XX |
## 4. Recommendations
1. [Recommendation 1]
2. [Recommendation 2]
Best practices
- Understand the data first: Learn structure and meaning before analysis
- Incremental analysis: Move from simple to complex analyses
- Use visualization: Use a variety of charts to spot patterns
- Validate assumptions: Always verify assumptions about the data
- Reproducibility: Document analysis code and results
Constraints
Required rules (MUST)
- Preserve raw data (work on a copy)
- Document the analysis process
- Validate results
Prohibited (MUST NOT)
- Do not expose sensitive personal data
- Do not draw unsupported conclusions
References
Examples
Example 1: Basic usage
Example 2: Advanced usage
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
mapping-mitre-attack-techniques
Maps observed adversary behaviors, security alerts, and detection rules to MITRE ATT&CK techniques and sub-techniques to quantify detection coverage and guide control prioritization. Use when building an ATT&CK-based coverage heatmap, tagging SIEM alerts with technique IDs, aligning security controls to adversary playbooks, or reporting threat exposure to executives. Activates for requests involving ATT&CK Navigator, Sigma rules, MITRE D3FEND, or coverage gap analysis.
hunting-for-spearphishing-indicators
Hunt for spearphishing campaign indicators across email logs, endpoint telemetry, and network data to detect targeted email attacks.
analyzing-malicious-url-with-urlscan
URLScan.io is a free service for scanning and analyzing suspicious URLs. It captures screenshots, DOM content, HTTP transactions, JavaScript behavior, and network connections of web pages in an isolat
implementing-zero-standing-privilege-with-cyberark
Deploy CyberArk Secure Cloud Access to eliminate standing privileges in hybrid and multi-cloud environments using just-in-time access with time, entitlement, and approval controls.
implementing-pam-for-database-access
Deploy privileged access management for database systems including Oracle, SQL Server, PostgreSQL, and MySQL. Covers session proxy configuration, credential vaulting, query auditing, dynamic credentia
detecting-t1003-credential-dumping-with-edr
Detect OS credential dumping techniques targeting LSASS memory, SAM database, NTDS.dit, and cached credentials using EDR telemetry, Sysmon process access monitoring, and Windows security event correlation.
Didn't find tool you were looking for?