Agent skill

schema-readme-generator

Use this skill to create or update README.md files for BigQuery ETL tables in the mozilla bigquery-etl repository. Follows layout conventions derived from comparing README files across the repo β€” rich style with emoji headings, Mermaid data flow diagram, graduated example queries, and concise metadata overview table. Requires schema.yaml with complete descriptions (run schema-enricher first if needed) and a complete metadata.yaml.

Stars 6
Forks 1

Install this agent skill to your Project

npx add-skill https://github.com/mozilla/bigquery-etl-skills/tree/main/skills/schema-readme-generator

SKILL.md

README Generator

Prerequisites: Run schema-enricher first if schema.yaml is missing descriptions; ensure metadata.yaml is present and complete. When to use: Creating or updating README.md for any shared dataset, derived table, or table with multiple downstream consumers

🚨 REQUIRED READING - Start Here

BEFORE generating any README, review the following:

  1. Layout conventions: READ references/layout_conventions.md

    • Section order, conciseness rules, anti-patterns to avoid
    • Information sources (which file to read for which section)
  2. README template: READ assets/readme_template.md and COPY its structure

    • Fill every {placeholder} from the source files
    • Do not skip or reorder sections

Workflow

Step 1: Read source files

Read all three files before writing anything:

sql/<project>/<dataset>/<table>/query.sql     β†’ source tables, GROUP BY dimensions, metrics, @param
sql/<project>/<dataset>/<table>/metadata.yaml β†’ DAG, partitioning, clustering, retention, owners
sql/<project>/<dataset>/<table>/schema.yaml   β†’ field names, types, descriptions for Key Fields section

If only query.py exists (no query.sql): note it β€” the Data Flow and How It Works sections may be incomplete or require manual input. Fill what is possible from metadata.yaml and schema.yaml.

Extract and record:

  • FROM clause β€” source table(s) with fully qualified name
  • GROUP BY fields β€” these become Dimensions
  • Aggregated fields β€” SUM/COUNT/DISTINCT targets become Metrics
  • WHERE clause β€” @param_name for Implementation Notes
  • DAG name, partition field, cluster fields, owners β€” for Overview table
  • Table version β€” from directory name (e.g., _v1)

Step 2: Check if README.md already exists

bash
ls sql/<project>/<dataset>/<table>/README.md
  • Exists β†’ read it, identify sections to update or add (do not remove existing content without noting it)
  • Does not exist β†’ generate from template

Step 3: Write README.md

READ assets/readme_template.md and fill every placeholder:

πŸ“Œ Overview table β€” use metadata.yaml for DAG/partition/cluster/retention/owner; derive Version from directory name.

πŸ—ΊοΈ Data Flow β€” Mermaid flowchart TD with exactly 3 nodes:

  • Node A: source table(s) with short label + fully qualified name
  • Node B: **This query** with filter and GROUP BY description
  • Node C: Partitioned table with time and cluster annotation
  • For multiple sources: A1, A2 β†’ B

🧠 How It Works β€” 4–5 numbered steps. Step 5 MUST explicitly state data inclusion/exclusion policy:

  • "All records from source are included; no exclusions applied at this layer."
  • OR list specific exclusions (bots, synthetic clients, test populations)

🧾 Key Fields β€” two sub-tables (Dimensions, Metrics). Use {a\|b\|c} shorthand for related field families. Group dimensions by: Date & Geo, Browser, Search, [Product] config, User. Omit dimension rows not applicable to this table.

🧩 Example Queries β€” exactly 3, graduated:

  1. Basic aggregation β€” date filter + 1–2 GROUP BY dimensions
  2. Segmentation β€” GROUP BY a user/product dimension with SAFE_DIVIDE ratio
  3. Attribution/Advanced β€” multi-metric, WHERE filter on a dimension, SAFE_DIVIDE

Rules:

  • Always use SAFE_DIVIDE() for ratios β€” never raw division
  • Use GROUP BY 1, 2 shorthand
  • Comment each: -- N. Description
  • Fully qualified table name in FROM

πŸ”§ Implementation Notes β€” 3–5 bullets extracted from query.sql logic.

πŸ“Œ Notes & Conventions β€” bullet definitions for key fields from schema.yaml descriptions.

πŸ—ƒοΈ Schema & Related Tables β€” one section; combine schema.yaml link + upstream + downstream.

Step 4: Conciseness check

Before finalizing, verify:

  • Total line count ≀ 170
  • No separate sections for Scheduling, Storage, Owners, Retention (all in Overview table)
  • No separate "Schema Reference" + "Related Tables" (merged into πŸ—ƒοΈ)
  • SQL examples use GROUP BY 1, 2 shorthand
  • How It Works uses single-line numbered steps (no multi-paragraph blocks)
  • How It Works Step 5 explicitly states data inclusion/exclusion policy

If over 170 lines, trim by: shortening SQL examples, collapsing Notes & Conventions bullets, abbreviating How It Works steps.

Step 5: Write and report

Write the README.md to:

sql/<project>/<dataset>/<table>/README.md

Then read back the written file and confirm:

  • All sections from the template are present and in order
  • No {placeholder} tokens remain unfilled (exception: if only query.py exists, Data Flow and How It Works may be partially filled β€” note which sections and why)
  • Line count is within target (≀ 170)
  • Mermaid block renders valid flowchart TD syntax

Report:

  • Path written
  • Line count
  • Sections included
  • Any placeholders left unfilled (with reason)

Integration with Other Skills

Skill When to invoke
schema-enricher Run first if schema.yaml is missing descriptions β€” needed for Notes & Conventions

Decision Tree: Rich vs. Minimal Style

Table has multiple downstream consumers OR is a shared dataset?
  β†’ Rich style (this skill)

Table is a UDF, static reference, or simple single-consumer table?
  β†’ Minimal style: title + ## Description with 5–10 bullet points
  β†’ Do not use this skill for minimal style

Example Invocations

Create a README.md for telemetry_derived.newtab_daily_interactions_aggregates_v1
Update the README.md for firefox_desktop_derived.newtab_clients_daily_v2 β€” add missing example queries
Generate README for ads_derived.impressions_v1

Expand your agent's capabilities with these related and highly-rated skills.

mozilla/bigquery-etl-skills

model-requirements

Use this skill when gathering requirements for new BigQuery data models OR when asked to edit existing queries in bqetl. For new models, guides structured requirements interviews. For existing queries, understands current model, checks downstream dependencies, and gathers requirements for changes. Works as pre-planning before query-writer skill.

6 1
Explore
mozilla/bigquery-etl-skills

metadata-manager

Use this skill when creating or updating DAG configurations (dags.yaml), schema.yaml, and metadata.yaml files for BigQuery tables. Handles creating new DAGs when needed and coordinates test updates when queries are modified (invokes sql-test-generator as needed). Works with bigquery-etl-core, query-writer, and sql-test-generator skills.

6 1
Explore
mozilla/bigquery-etl-skills

bigconfig-generator

Use this skill when creating or updating Bigeye monitoring configurations (bigconfig.yml files) for BigQuery tables. Works with metadata-manager skill.

6 1
Explore
mozilla/bigquery-etl-skills

bigquery-etl-core

The core skill for working within the bigquery-etl repository. Use this skill when understanding project structure, conventions, and common patterns. Works with model-requirements, query-writer, metadata-manager, sql-test-generator, and bigconfig-generator skills.

6 1
Explore
mozilla/bigquery-etl-skills

query-writer

Use this skill when writing or updating SQL queries (query.sql) or Python ETL scripts (query.py) following Mozilla BigQuery ETL conventions. ALWAYS checks for and updates existing tests when modifying queries. Coordinates downstream updates to schemas and tests. Works with bigquery-etl-core, metadata-manager, and sql-test-generator skills.

6 1
Explore
mozilla/bigquery-etl-skills

sql-test-generator

ALWAYS use this skill when users ask to create, generate, or write UNIT TESTS for BigQuery SQL queries. Invoke proactively whenever the request includes "test" or "tests" with a query/table name. This skill is for unit testing ONLY (not data quality checks - use bigconfig-generator for Bigeye monitoring). Works with bigquery-etl-core skill to understand query patterns.

6 1
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results