Agent skill
data-context-extractor-phase-1-database-connection-discovery
Sub-skill of data-context-extractor: Phase 1: Database Connection & Discovery (+3).
Install this agent skill to your Project
npx add-skill https://github.com/vamseeachanta/workspace-hub/tree/main/.claude/skills/_archive/data/analytics/data-context-extractor/phase-1-database-connection-discovery
SKILL.md
Phase 1: Database Connection & Discovery (+3)
Phase 1: Database Connection & Discovery
Step 1: Identify the database type
Ask: "What data warehouse are you using?"
Common options:
- BigQuery
- Snowflake
- PostgreSQL/Redshift
- Databricks
Use ~~data warehouse tools (query and schema) to connect. If unclear, check available MCP tools in the current session.
Step 2: Explore the schema
Use ~~data warehouse schema tools to:
- List available datasets/schemas
- Identify the most important tables (ask user: "Which 3-5 tables do analysts query most often?")
- Pull schema details for those key tables
Sample exploration queries by dialect:
-- BigQuery: List datasets
SELECT schema_name FROM INFORMATION_SCHEMA.SCHEMATA
-- BigQuery: List tables in a dataset
SELECT table_name FROM `project.dataset.INFORMATION_SCHEMA.TABLES`
-- Snowflake: List schemas
SHOW SCHEMAS IN DATABASE my_database
-- Snowflake: List tables
SHOW TABLES IN SCHEMA my_schema
Phase 2: Core Questions (Ask These)
After schema discovery, ask these questions conversationally (not all at once):
Entity Disambiguation (Critical)
"When people here say 'user' or 'customer', what exactly do they mean? Are there different types?"
Listen for:
- Multiple entity types (user vs account vs organization)
- Relationships between them (1:1, 1:many, many:many)
- Which ID fields link them together
Primary Identifiers
"What's the main identifier for a [customer/user/account]? Are there multiple IDs for the same entity?"
Listen for:
- Primary keys vs business keys
- UUID vs integer IDs
- Legacy ID systems
Key Metrics
"What are the 2-3 metrics people ask about most? How is each one calculated?"
Listen for:
- Exact formulas (ARR = monthly_revenue x 12)
- Which tables/columns feed each metric
- Time period conventions (trailing 7 days, calendar month, etc.)
Data Hygiene
"What should ALWAYS be filtered out of queries? (test data, fraud, internal users, etc.)"
Listen for:
- Standard WHERE clauses to always include
- Flag columns that indicate exclusions (is_test, is_internal, is_fraud)
- Specific values to exclude (status = 'deleted')
Common Gotchas
"What mistakes do new analysts typically make with this data?"
Listen for:
- Confusing column names
- Timezone issues
- NULL handling quirks
- Historical vs current state tables
Phase 3: Generate the Skill
Create a skill with this structure:
[company]-data-analyst/
├── SKILL.md
└── references/
├── entities.md # Entity definitions and relationships
├── metrics.md # KPI calculations
├── tables/ # One file per domain
│ ├── [domain1].md
│ └── [domain2].md
└── dashboards.json # Optional: existing dashboards catalog
SKILL.md Template: See references/skill-template.md
SQL Dialect Section: See references/sql-dialects.md and include the appropriate dialect notes.
Reference File Template: See references/domain-template.md
Phase 4: Package and Deliver
- Create all files in the skill directory
- Package as a zip file
- Present to user with summary of what was captured
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
gsd-complete-milestone
Archive completed milestone and prepare for next version
gsd-reapply-patches
Reapply local modifications after a GSD update
gsd-verify-work
Validate built features through conversational UAT
gsd-thread
Manage persistent context threads for cross-session work
clinical-trial-protocol
Generate clinical trial protocols for medical devices or drugs through a modular, waypoint-based architecture with research-only and full protocol modes.
single-cell-rna-qc
Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations.
Didn't find tool you were looking for?