Agent skill

aps-doc-staging

Expert documentation generation for staging transformation layers. Auto-detects SQL engine (Presto/Trino vs Hive), documents transformation rules, PII handling, deduplication strategies, and data quality rules. Use when documenting staging transformations.

View SKILL.md on GitHub Repository

Stars 16

Forks 23

Install this agent skill to your Project

npx add-skill https://github.com/treasure-data/td-skills/tree/main/aps-doc-skills/staging

SKILL.md

APS Staging Transformation Documentation Expert

Specialized skill for generating comprehensive documentation for staging transformation layers. Automatically detects SQL engines, extracts transformation rules, documents PII handling, and analyzes deduplication strategies.

When to Use This Skill

Use this skill when:

Documenting staging transformation workflows
Creating documentation for data cleaning and standardization logic
Documenting PII handling and security transformations
Creating documentation for deduplication strategies
Documenting data quality rules and validations
Generating documentation for Presto/Trino or Hive transformations

Example requests:

"Document the staging transformation for customer events"
"Create staging layer documentation with transformation rules"
"Document PII handling in staging transformations"
"Generate staging documentation following this template: [Confluence URL]"

🚨 MANDATORY: Codebase Access Required

WITHOUT codebase access = NO documentation. Period.

If no codebase access provided:

I cannot create technical documentation without codebase access.

Required:
- Directory path to staging workflows
- Access to .dig, .sql, .yml files

Without access, I cannot extract real transformation SQL, PII logic, or table names.
Provide path: "Code is in /path/to/staging/"

Before proceeding:

Ask for codebase path if not provided
Use Glob to verify SQL files exist
STOP if cannot read files

Documentation MUST contain:

Real transformation SQL from .sql files
Actual PII hashing/masking logic
Real table/column names
Working SQL examples from code

NO generic placeholders. Only real, extracted data.

REQUIRED Documentation Template

Follow this EXACT structure (analyzed from production examples):

markdown

# Staging Transformation - {Engine} Engine

## Overview
**Engine**: {Presto/Trino or Hive}
**Architecture**: {Loop-based / Other}
**Processing Mode**: {Incremental / Full}
**Location**: {directory path}

### Key Characteristics
{List key features from actual workflow}

---

## Architecture Overview

### Directory Structure
{Actual directory tree from codebase}

### Core Components

#### 1. Main Workflow File
{Name and purpose}

**Key Features:**
- {Feature from actual .dig file}
- {Feature from actual .dig file}

**Workflow Phases:**
{Extract from actual workflow}

#### 2. Configuration File
{Name and structure from actual codebase}

**Configuration Structure:**
{Real YAML structure}

**Table Configuration Fields:**
{Document actual fields used}

#### 3. SQL Transformation Files
{Types: init, incremental, upsert - from actual codebase}

---

## Processing Flow

### Initial Load (First Run)
{Step-by-step from actual workflow}

### Incremental Load (Subsequent Runs)
{Step-by-step from actual workflow}

---

## Data Transformation Rules

{Document ACTUAL transformation rules from codebase}

### 1. Date/Timestamp Processing
{Real SQL examples from transformation files}

### 2. String Standardization
{Real SQL examples}

### 3. JSON Extraction
{Real examples if exists}

### 4. Email Processing
{Real examples if exists}

### 5. Phone Number Processing
{Real examples if exists}

### 6. Deduplication Logic
{Real ROW_NUMBER() or DISTINCT logic}

### 7. Metadata Columns
{Real source_system, load_timestamp columns}

---

## Table-Specific Transformation Rules

{If using reference table like staging_trnsfrm_rules:}

**Reference Table**: {database}.{table}
**Purpose**: {explain}

**Schema**: {real schema}

**How Used**: {explain how workflow reads these rules}

---

## Current Implementation

**Configured Tables**:
{List actual tables from config}

---

## How to Add New Source Tables

{Step-by-step with real examples}

---

## Monitoring & Troubleshooting

**Key Queries**:
{Real SQL for checking status, data quality}

**Common Issues**:
{Real issues and solutions}

---

## Best Practices

{List from actual production experience}

---

## Summary

{Brief recap of capabilities}

Template Usage Notes:

Read actual workflows (.dig), configs (.yml), SQL files
Extract REAL transformation logic from SQL
Document REAL deduplication strategies
Use actual table/column names from codebase
Include working SQL examples
NO placeholders - only real extracted data

Summary

This skill generates production-ready staging documentation by:

Reading actual .dig workflows, .yml configs, and .sql files
Following the exact template structure shown above
Extracting real transformation rules from SQL
Documenting actual deduplication logic
Creating comprehensive documentation with working SQL examples

Key capability: Transforms staging codebase into professional Confluence documentation with all transformation rules documented.

Maintainer

treasure-data Core maintainer

Source details

Full Name: treasure-data/td-skills
Branch: main
Path in repo: aps-doc-skills/staging

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

treasure-data/td-skills

email-campaign

This skill should be used when the user asks to "create an email", "build an email campaign", "design an email template", "generate an email for a segment", "preview an email", or "push an email to Engage". Generates enterprise-grade HTML email templates with live preview in Treasure Studio and natural language editing, then pushes the final version to Treasure Engage.

16 23

Explore

treasure-data/td-skills

action-report

YAML format reference for action reports rendered via preview_action_report. MUST be read before writing any action report YAML — defines the report structure (title, summary, actions array) and action item fields (as_is, to_be, reason, priority, category, impact) with incremental build workflow. Required by seo-analysis and any skill that produces prioritized recommendations.

16 23

Explore

treasure-data/td-skills

grid-dashboard

YAML format reference for grid dashboards rendered via preview_grid_dashboard. MUST be read before writing any dashboard YAML — defines the page structure, 6 cell types (kpi, gauge, scores, table, chart, markdown), grid layout rules, cell merging syntax, and incremental build workflow. Required by seo-analysis and any skill that produces visual data dashboards.

16 23

Explore

treasure-data/td-skills

seo-analysis

Runs SEO and AEO (Answer Engine Optimization) analysis on websites or specific pages. Use when the user mentions SEO, AEO, search rankings, search optimization, or wants to analyze how their pages perform in search engines and AI answers. Produces a data dashboard and action report with before/after recommendations.

16 23

Explore

treasure-data/td-skills

aps-doc-core

Core documentation generation patterns and framework for Treasure Data pipeline layers. Provides shared templates, quality validation, testing framework, and Confluence integration used by all layer-specific documentation skills.

16 23

Explore

treasure-data/td-skills

aps-doc-id-unification

Expert documentation generation for ID unification layers. Documents identity resolution algorithms, merge strategies, match rules, entity graphs, and multi-workflow orchestration. Use when documenting ID unification processes.

16 23

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

APS Staging Transformation Documentation Expert

When to Use This Skill

🚨 MANDATORY: Codebase Access Required

REQUIRED Documentation Template

Summary

Recommended Agent Skills

email-campaign

action-report

grid-dashboard

seo-analysis

aps-doc-core

aps-doc-id-unification