Agent skill

data-engineering

Master data engineering, ETL/ELT, data warehousing, SQL optimization, and analytics. Use when building data pipelines, designing data systems, or working with large datasets.

Stars 1
Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/pluginagentmarketplace/custom-plugin-typescript/tree/main/skills/data

SKILL.md

Data Engineering & Analytics Skill

Quick Start - SQL Data Pipeline

sql
-- Create staging table
CREATE TABLE staging_events AS
SELECT 
  event_id,
  user_id,
  event_type,
  event_time,
  properties
FROM raw_events
WHERE event_time >= CURRENT_DATE - INTERVAL '1 day'
AND event_type IN ('click', 'purchase', 'view');

-- Aggregate metrics
SELECT
  DATE(event_time) as date,
  user_id,
  COUNT(*) as event_count,
  COUNT(DISTINCT event_type) as unique_events
FROM staging_events
GROUP BY 1, 2
ORDER BY date DESC, event_count DESC;

Core Technologies

Data Processing

  • Apache Spark
  • Apache Flink
  • Pandas / Polars
  • dbt (data transformation)

Data Warehousing

  • Snowflake
  • BigQuery (GCP)
  • Redshift (AWS)
  • Azure Synapse

ETL/ELT Tools

  • dbt
  • Airflow
  • Talend
  • Informatica

Streaming

  • Apache Kafka
  • AWS Kinesis
  • Apache Pulsar

ML & Analytics

  • scikit-learn
  • TensorFlow
  • Tableau / Power BI

Best Practices

  1. Data Quality - Validation and testing
  2. Documentation - Clear metadata
  3. Performance - Query optimization
  4. Governance - Data security
  5. Monitoring - Pipeline alerts
  6. Scalability - Design for growth
  7. Version Control - Git for code and configs
  8. Testing - Data and pipeline testing

Resources

Didn't find tool you were looking for?

Be as detailed as possible for better results