Agent skill
schema-readme-generator
Use this skill to create or update README.md files for BigQuery ETL tables in the mozilla bigquery-etl repository. Follows layout conventions derived from comparing README files across the repo β rich style with emoji headings, Mermaid data flow diagram, graduated example queries, and concise metadata overview table. Requires schema.yaml with complete descriptions (run schema-enricher first if needed) and a complete metadata.yaml.
Install this agent skill to your Project
npx add-skill https://github.com/mozilla/bigquery-etl-skills/tree/main/skills/schema-readme-generator
SKILL.md
README Generator
Prerequisites: Run schema-enricher first if schema.yaml is missing descriptions; ensure metadata.yaml is present and complete.
When to use: Creating or updating README.md for any shared dataset, derived table, or table with multiple downstream consumers
π¨ REQUIRED READING - Start Here
BEFORE generating any README, review the following:
-
Layout conventions: READ
references/layout_conventions.md- Section order, conciseness rules, anti-patterns to avoid
- Information sources (which file to read for which section)
-
README template: READ
assets/readme_template.mdand COPY its structure- Fill every
{placeholder}from the source files - Do not skip or reorder sections
- Fill every
Workflow
Step 1: Read source files
Read all three files before writing anything:
sql/<project>/<dataset>/<table>/query.sql β source tables, GROUP BY dimensions, metrics, @param
sql/<project>/<dataset>/<table>/metadata.yaml β DAG, partitioning, clustering, retention, owners
sql/<project>/<dataset>/<table>/schema.yaml β field names, types, descriptions for Key Fields section
If only query.py exists (no query.sql): note it β the Data Flow and How It Works sections may be incomplete or require manual input. Fill what is possible from metadata.yaml and schema.yaml.
Extract and record:
- FROM clause β source table(s) with fully qualified name
- GROUP BY fields β these become Dimensions
- Aggregated fields β SUM/COUNT/DISTINCT targets become Metrics
- WHERE clause β
@param_namefor Implementation Notes - DAG name, partition field, cluster fields, owners β for Overview table
- Table version β from directory name (e.g.,
_v1)
Step 2: Check if README.md already exists
ls sql/<project>/<dataset>/<table>/README.md
- Exists β read it, identify sections to update or add (do not remove existing content without noting it)
- Does not exist β generate from template
Step 3: Write README.md
READ assets/readme_template.md and fill every placeholder:
π Overview table β use metadata.yaml for DAG/partition/cluster/retention/owner; derive Version from directory name.
πΊοΈ Data Flow β Mermaid flowchart TD with exactly 3 nodes:
- Node A: source table(s) with short label + fully qualified name
- Node B:
**This query**with filter and GROUP BY description - Node C:
Partitioned tablewith time and cluster annotation - For multiple sources: A1, A2 β B
π§ How It Works β 4β5 numbered steps. Step 5 MUST explicitly state data inclusion/exclusion policy:
- "All records from source are included; no exclusions applied at this layer."
- OR list specific exclusions (bots, synthetic clients, test populations)
π§Ύ Key Fields β two sub-tables (Dimensions, Metrics). Use {a\|b\|c} shorthand for related field families. Group dimensions by: Date & Geo, Browser, Search, [Product] config, User. Omit dimension rows not applicable to this table.
π§© Example Queries β exactly 3, graduated:
- Basic aggregation β date filter + 1β2 GROUP BY dimensions
- Segmentation β GROUP BY a user/product dimension with SAFE_DIVIDE ratio
- Attribution/Advanced β multi-metric, WHERE filter on a dimension, SAFE_DIVIDE
Rules:
- Always use
SAFE_DIVIDE()for ratios β never raw division - Use
GROUP BY 1, 2shorthand - Comment each:
-- N. Description - Fully qualified table name in FROM
π§ Implementation Notes β 3β5 bullets extracted from query.sql logic.
π Notes & Conventions β bullet definitions for key fields from schema.yaml descriptions.
ποΈ Schema & Related Tables β one section; combine schema.yaml link + upstream + downstream.
Step 4: Conciseness check
Before finalizing, verify:
- Total line count β€ 170
- No separate sections for Scheduling, Storage, Owners, Retention (all in Overview table)
- No separate "Schema Reference" + "Related Tables" (merged into ποΈ)
- SQL examples use
GROUP BY 1, 2shorthand - How It Works uses single-line numbered steps (no multi-paragraph blocks)
- How It Works Step 5 explicitly states data inclusion/exclusion policy
If over 170 lines, trim by: shortening SQL examples, collapsing Notes & Conventions bullets, abbreviating How It Works steps.
Step 5: Write and report
Write the README.md to:
sql/<project>/<dataset>/<table>/README.md
Then read back the written file and confirm:
- All sections from the template are present and in order
- No
{placeholder}tokens remain unfilled (exception: if onlyquery.pyexists, Data Flow and How It Works may be partially filled β note which sections and why) - Line count is within target (β€ 170)
- Mermaid block renders valid
flowchart TDsyntax
Report:
- Path written
- Line count
- Sections included
- Any placeholders left unfilled (with reason)
Integration with Other Skills
| Skill | When to invoke |
|---|---|
schema-enricher |
Run first if schema.yaml is missing descriptions β needed for Notes & Conventions |
Decision Tree: Rich vs. Minimal Style
Table has multiple downstream consumers OR is a shared dataset?
β Rich style (this skill)
Table is a UDF, static reference, or simple single-consumer table?
β Minimal style: title + ## Description with 5β10 bullet points
β Do not use this skill for minimal style
Example Invocations
Create a README.md for telemetry_derived.newtab_daily_interactions_aggregates_v1
Update the README.md for firefox_desktop_derived.newtab_clients_daily_v2 β add missing example queries
Generate README for ads_derived.impressions_v1
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
model-requirements
Use this skill when gathering requirements for new BigQuery data models OR when asked to edit existing queries in bqetl. For new models, guides structured requirements interviews. For existing queries, understands current model, checks downstream dependencies, and gathers requirements for changes. Works as pre-planning before query-writer skill.
metadata-manager
Use this skill when creating or updating DAG configurations (dags.yaml), schema.yaml, and metadata.yaml files for BigQuery tables. Handles creating new DAGs when needed and coordinates test updates when queries are modified (invokes sql-test-generator as needed). Works with bigquery-etl-core, query-writer, and sql-test-generator skills.
bigconfig-generator
Use this skill when creating or updating Bigeye monitoring configurations (bigconfig.yml files) for BigQuery tables. Works with metadata-manager skill.
bigquery-etl-core
The core skill for working within the bigquery-etl repository. Use this skill when understanding project structure, conventions, and common patterns. Works with model-requirements, query-writer, metadata-manager, sql-test-generator, and bigconfig-generator skills.
query-writer
Use this skill when writing or updating SQL queries (query.sql) or Python ETL scripts (query.py) following Mozilla BigQuery ETL conventions. ALWAYS checks for and updates existing tests when modifying queries. Coordinates downstream updates to schemas and tests. Works with bigquery-etl-core, metadata-manager, and sql-test-generator skills.
sql-test-generator
ALWAYS use this skill when users ask to create, generate, or write UNIT TESTS for BigQuery SQL queries. Invoke proactively whenever the request includes "test" or "tests" with a query/table name. This skill is for unit testing ONLY (not data quality checks - use bigconfig-generator for Bigeye monitoring). Works with bigquery-etl-core skill to understand query patterns.
Didn't find tool you were looking for?