Agent skill

data-context-extractor-phase-1-database-connection-discovery

Sub-skill of data-context-extractor: Phase 1: Database Connection & Discovery (+3).

Stars 4
Forks 4

Install this agent skill to your Project

npx add-skill https://github.com/vamseeachanta/workspace-hub/tree/main/.claude/skills/_archive/data/analytics/data-context-extractor/phase-1-database-connection-discovery

SKILL.md

Phase 1: Database Connection & Discovery (+3)

Phase 1: Database Connection & Discovery

Step 1: Identify the database type

Ask: "What data warehouse are you using?"

Common options:

  • BigQuery
  • Snowflake
  • PostgreSQL/Redshift
  • Databricks

Use ~~data warehouse tools (query and schema) to connect. If unclear, check available MCP tools in the current session.

Step 2: Explore the schema

Use ~~data warehouse schema tools to:

  1. List available datasets/schemas
  2. Identify the most important tables (ask user: "Which 3-5 tables do analysts query most often?")
  3. Pull schema details for those key tables

Sample exploration queries by dialect:

sql
-- BigQuery: List datasets
SELECT schema_name FROM INFORMATION_SCHEMA.SCHEMATA

-- BigQuery: List tables in a dataset
SELECT table_name FROM `project.dataset.INFORMATION_SCHEMA.TABLES`

-- Snowflake: List schemas
SHOW SCHEMAS IN DATABASE my_database

-- Snowflake: List tables
SHOW TABLES IN SCHEMA my_schema

Phase 2: Core Questions (Ask These)

After schema discovery, ask these questions conversationally (not all at once):

Entity Disambiguation (Critical)

"When people here say 'user' or 'customer', what exactly do they mean? Are there different types?"

Listen for:

  • Multiple entity types (user vs account vs organization)
  • Relationships between them (1:1, 1:many, many:many)
  • Which ID fields link them together

Primary Identifiers

"What's the main identifier for a [customer/user/account]? Are there multiple IDs for the same entity?"

Listen for:

  • Primary keys vs business keys
  • UUID vs integer IDs
  • Legacy ID systems

Key Metrics

"What are the 2-3 metrics people ask about most? How is each one calculated?"

Listen for:

  • Exact formulas (ARR = monthly_revenue x 12)
  • Which tables/columns feed each metric
  • Time period conventions (trailing 7 days, calendar month, etc.)

Data Hygiene

"What should ALWAYS be filtered out of queries? (test data, fraud, internal users, etc.)"

Listen for:

  • Standard WHERE clauses to always include
  • Flag columns that indicate exclusions (is_test, is_internal, is_fraud)
  • Specific values to exclude (status = 'deleted')

Common Gotchas

"What mistakes do new analysts typically make with this data?"

Listen for:

  • Confusing column names
  • Timezone issues
  • NULL handling quirks
  • Historical vs current state tables

Phase 3: Generate the Skill

Create a skill with this structure:

[company]-data-analyst/
├── SKILL.md
└── references/
    ├── entities.md          # Entity definitions and relationships
    ├── metrics.md           # KPI calculations
    ├── tables/              # One file per domain
    │   ├── [domain1].md
    │   └── [domain2].md
    └── dashboards.json      # Optional: existing dashboards catalog

SKILL.md Template: See references/skill-template.md

SQL Dialect Section: See references/sql-dialects.md and include the appropriate dialect notes.

Reference File Template: See references/domain-template.md

Phase 4: Package and Deliver

  1. Create all files in the skill directory
  2. Package as a zip file
  3. Present to user with summary of what was captured

Expand your agent's capabilities with these related and highly-rated skills.

Didn't find tool you were looking for?

Be as detailed as possible for better results