Agent skill
SQL for Analytics
Writing SQL queries for analytics including aggregations, window functions, CTEs, and complex joins to extract insights from large datasets efficiently.
Stars
163
Forks
31
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/sql-for-analytics
SKILL.md
SQL for Analytics
Current Level: Intermediate
Domain: Business Analytics / Data Engineering
Overview
Analytics SQL focuses on aggregations, analysis, and insights rather than transactional operations. Effective analytics SQL uses window functions, CTEs, complex joins, and optimization techniques to query large datasets efficiently.
Analytics SQL vs Transactional SQL
| Aspect | Transactional SQL | Analytics SQL |
|---|---|---|
| Purpose | CRUD operations | Aggregations, analysis |
| Focus | Single record | Many records |
| Operations | INSERT, UPDATE, DELETE | SELECT, GROUP BY, JOIN |
| Performance | Fast queries | Complex queries |
| Example | UPDATE users SET status = 'active' |
SELECT COUNT(*) FROM users GROUP BY status |
Core Analytics Queries
Aggregations
sql
-- Basic aggregations
SELECT
COUNT(*) AS total_orders,
SUM(revenue) AS total_revenue,
AVG(revenue) AS avg_revenue,
MIN(revenue) AS min_revenue,
MAX(revenue) AS max_revenue
FROM orders;
-- Conditional aggregations
SELECT
COUNT(*) AS total_orders,
COUNT(CASE WHEN status = 'completed' THEN 1 END) AS completed_orders,
SUM(CASE WHEN status = 'completed' THEN revenue END) AS completed_revenue
FROM orders;
GROUP BY
sql
-- Group by single dimension
SELECT
country,
COUNT(*) AS orders,
SUM(revenue) AS total_revenue
FROM orders
GROUP BY country
ORDER BY total_revenue DESC;
-- Group by multiple dimensions
SELECT
country,
DATE_TRUNC('month', order_date) AS month,
COUNT(*) AS orders,
SUM(revenue) AS total_revenue
FROM orders
GROUP BY country, DATE_TRUNC('month', order_date)
ORDER BY country, month;
HAVING
sql
-- Filter aggregated results
SELECT
country,
COUNT(*) AS orders,
SUM(revenue) AS total_revenue
FROM orders
GROUP BY country
HAVING SUM(revenue) > 100000
ORDER BY total_revenue DESC;
DISTINCT
sql
-- Count unique values
SELECT
COUNT(DISTINCT user_id) AS unique_users,
COUNT(DISTINCT country) AS unique_countries
FROM orders;
-- Distinct combinations
SELECT DISTINCT
user_id,
country
FROM orders
ORDER BY user_id, country;
Window Functions
ROW_NUMBER()
sql
-- Rank rows within groups
SELECT
user_id,
order_date,
revenue,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date) AS order_rank
FROM orders;
RANK() and DENSE_RANK()
sql
-- Rank with ties
SELECT
product_id,
revenue,
RANK() OVER (ORDER BY revenue DESC) AS rank_rank,
DENSE_RANK() OVER (ORDER BY revenue DESC) AS dense_rank
FROM products;
| Revenue | RANK() | DENSE_RANK() |
|---|---|---|
| 100 | 1 | 1 |
| 100 | 1 | 1 |
| 90 | 3 | 2 |
| 80 | 4 | 3 |
LAG() and LEAD()
sql
-- Compare with previous/next row
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(revenue) AS monthly_revenue,
LAG(SUM(revenue), 1) OVER (ORDER BY DATE_TRUNC('month', order_date)) AS prev_month_revenue,
SUM(revenue) - LAG(SUM(revenue), 1) OVER (ORDER BY DATE_TRUNC('month', order_date)) AS revenue_change
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;
Running Totals
sql
-- Cumulative sum
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(revenue) AS monthly_revenue,
SUM(SUM(revenue)) OVER (
ORDER BY DATE_TRUNC('month', order_date)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;
Moving Averages
sql
-- 7-day moving average
SELECT
order_date,
SUM(revenue) AS daily_revenue,
AVG(SUM(revenue)) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS ma_7day
FROM orders
GROUP BY order_date
ORDER BY order_date;
Date/Time Analysis
DATE_TRUNC
sql
-- Group by time periods
SELECT
DATE_TRUNC('day', order_date) AS day,
SUM(revenue) AS daily_revenue
FROM orders
GROUP BY DATE_TRUNC('day', order_date)
ORDER BY day;
SELECT
DATE_TRUNC('week', order_date) AS week,
SUM(revenue) AS weekly_revenue
FROM orders
GROUP BY DATE_TRUNC('week', order_date)
ORDER BY week;
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(revenue) AS monthly_revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;
SELECT
DATE_TRUNC('quarter', order_date) AS quarter,
SUM(revenue) AS quarterly_revenue
FROM orders
GROUP BY DATE_TRUNC('quarter', order_date)
ORDER BY quarter;
SELECT
DATE_TRUNC('year', order_date) AS year,
SUM(revenue) AS yearly_revenue
FROM orders
GROUP BY DATE_TRUNC('year', order_date)
ORDER BY year;
EXTRACT
sql
-- Extract date parts
SELECT
order_date,
EXTRACT(YEAR FROM order_date) AS year,
EXTRACT(MONTH FROM order_date) AS month,
EXTRACT(DAY FROM order_date) AS day,
EXTRACT(HOUR FROM order_date) AS hour,
EXTRACT(DOW FROM order_date) AS day_of_week
FROM orders;
Date Arithmetic
sql
-- Date differences
SELECT
user_id,
signup_date,
first_purchase_date,
EXTRACT(DAY FROM (first_purchase_date - signup_date)) AS days_to_purchase
FROM users
WHERE first_purchase_date IS NOT NULL;
-- Date addition
SELECT
order_date,
order_date + INTERVAL '30 days' AS due_date,
order_date + INTERVAL '1 month' AS next_month,
order_date + INTERVAL '1 year' AS next_year
FROM orders;
Common Analytics Patterns
Daily Active Users (DAU)
sql
SELECT
event_date,
COUNT(DISTINCT user_id) AS dau
FROM events
WHERE event_type = 'page_view'
GROUP BY event_date
ORDER BY event_date;
Monthly Recurring Revenue (MRR)
sql
SELECT
DATE_TRUNC('month', start_date) AS month,
SUM(amount) AS mrr
FROM subscriptions
WHERE status = 'active'
AND billing_period = 'monthly'
GROUP BY DATE_TRUNC('month', start_date)
ORDER BY month;
Cohort Retention
sql
WITH cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', signup_date) AS cohort_month
FROM users
),
user_activities AS (
SELECT
user_id,
DATE_TRUNC('month', event_date) AS activity_month
FROM events
WHERE event_type = 'page_view'
)
SELECT
c.cohort_month,
ua.activity_month,
COUNT(DISTINCT ua.user_id) AS retained_users,
COUNT(DISTINCT c.user_id) AS cohort_size,
100.0 * COUNT(DISTINCT ua.user_id) / COUNT(DISTINCT c.user_id) AS retention_pct
FROM cohorts c
LEFT JOIN user_activities ua ON c.user_id = ua.user_id
GROUP BY c.cohort_month, ua.activity_month
ORDER BY c.cohort_month, ua.activity_month;
Conversion Funnel
sql
WITH funnel_steps AS (
SELECT
session_id,
'visit' AS step,
1 AS step_number
FROM page_views
WHERE page_url = '/'
UNION ALL
SELECT
session_id,
'product_view' AS step,
2 AS step_number
FROM page_views
WHERE page_url LIKE '/product/%'
UNION ALL
SELECT
session_id,
'add_to_cart' AS step,
3 AS step_number
FROM events
WHERE event_type = 'add_to_cart'
UNION ALL
SELECT
session_id,
'purchase' AS step,
4 AS step_number
FROM events
WHERE event_type = 'purchase'
)
SELECT
step,
step_number,
COUNT(DISTINCT session_id) AS users,
LAG(COUNT(DISTINCT session_id)) OVER (ORDER BY step_number) AS previous_users,
ROUND(100.0 * COUNT(DISTINCT session_id) / LAG(COUNT(DISTINCT session_id)) OVER (ORDER BY step_number), 2) AS conversion_rate
FROM funnel_steps
GROUP BY step, step_number
ORDER BY step_number;
Customer LTV
sql
WITH user_revenue AS (
SELECT
user_id,
SUM(revenue) AS total_revenue,
COUNT(*) AS order_count,
MIN(order_date) AS first_order,
MAX(order_date) AS last_order
FROM orders
GROUP BY user_id
)
SELECT
AVG(total_revenue) AS avg_ltv,
AVG(order_count) AS avg_orders,
AVG(EXTRACT(DAY FROM (last_order - first_order))) AS avg_lifetime_days
FROM user_revenue;
Year-Over-Year Growth
sql
WITH monthly_revenue AS (
SELECT
DATE_TRUNC('month', order_date) AS month,
EXTRACT(YEAR FROM order_date) AS year,
EXTRACT(MONTH FROM order_date) AS month_num,
SUM(revenue) AS revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
)
SELECT
year,
month_num,
revenue,
LAG(revenue, 12) OVER (ORDER BY year, month_num) AS revenue_prev_year,
ROUND(100.0 * (revenue - LAG(revenue, 12) OVER (ORDER BY year, month_num)) /
LAG(revenue, 12) OVER (ORDER BY year, month_num), 2) AS yoy_growth
FROM monthly_revenue
ORDER BY year, month_num;
CTEs (Common Table Expressions)
Basic CTE
sql
WITH monthly_revenue AS (
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(revenue) AS revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
)
SELECT
month,
revenue,
AVG(revenue) OVER (
ORDER BY month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS ma_3month
FROM monthly_revenue
ORDER BY month;
Multiple CTEs
sql
WITH user_orders AS (
SELECT
user_id,
COUNT(*) AS order_count,
SUM(revenue) AS total_revenue
FROM orders
GROUP BY user_id
),
user_segments AS (
SELECT
user_id,
order_count,
total_revenue,
CASE
WHEN total_revenue > 1000 THEN 'high_value'
WHEN total_revenue > 500 THEN 'medium_value'
ELSE 'low_value'
END AS segment
FROM user_orders
)
SELECT
segment,
COUNT(*) AS user_count,
AVG(order_count) AS avg_orders,
AVG(total_revenue) AS avg_revenue
FROM user_segments
GROUP BY segment
ORDER BY avg_revenue DESC;
JOINs for Analytics
LEFT JOIN (Keep all from left)
sql
-- All users, with orders if they have any
SELECT
u.user_id,
u.signup_date,
COUNT(o.order_id) AS order_count,
SUM(o.revenue) AS total_revenue
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.signup_date;
INNER JOIN (Only matches)
sql
-- Only users who have made orders
SELECT
u.user_id,
u.signup_date,
COUNT(o.order_id) AS order_count,
SUM(o.revenue) AS total_revenue
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.signup_date;
SELF JOIN (Compare rows in same table)
sql
-- Find users who made multiple orders in same day
SELECT
o1.user_id,
o1.order_date,
COUNT(*) AS orders_same_day
FROM orders o1
INNER JOIN orders o2 ON o1.user_id = o2.user_id
AND DATE(o1.order_date) = DATE(o2.order_date)
GROUP BY o1.user_id, o1.order_date
HAVING COUNT(*) > 1;
Multiple JOINs
sql
-- Join users, orders, and products
SELECT
u.user_id,
u.country,
COUNT(o.order_id) AS order_count,
SUM(o.revenue) AS total_revenue,
AVG(p.price) AS avg_product_price
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id
INNER JOIN products p ON o.product_id = p.product_id
GROUP BY u.user_id, u.country
ORDER BY total_revenue DESC;
CASE Statements
Conditional Logic
sql
-- Bucketing
SELECT
user_id,
revenue,
CASE
WHEN revenue < 50 THEN 'low'
WHEN revenue < 100 THEN 'medium'
WHEN revenue < 200 THEN 'high'
ELSE 'very_high'
END AS revenue_tier
FROM orders;
Pivot-like Transformation
sql
-- Pivot months to columns
SELECT
product_id,
SUM(CASE WHEN EXTRACT(MONTH FROM order_date) = 1 THEN revenue END) AS jan_revenue,
SUM(CASE WHEN EXTRACT(MONTH FROM order_date) = 2 THEN revenue END) AS feb_revenue,
SUM(CASE WHEN EXTRACT(MONTH FROM order_date) = 3 THEN revenue END) AS mar_revenue,
SUM(CASE WHEN EXTRACT(MONTH FROM order_date) = 4 THEN revenue END) AS apr_revenue
FROM orders
GROUP BY product_id;
Subqueries
Scalar Subquery
sql
-- Single value
SELECT
user_id,
revenue,
(SELECT AVG(revenue) FROM orders) AS avg_revenue,
revenue - (SELECT AVG(revenue) FROM orders) AS revenue_diff
FROM orders;
Correlated Subquery
sql
-- Subquery references outer query
SELECT
user_id,
order_date,
revenue,
(SELECT AVG(revenue)
FROM orders o2
WHERE o2.user_id = o1.user_id
) AS user_avg_revenue
FROM orders o1;
Subquery in FROM
sql
-- Use subquery as table
SELECT
month,
revenue,
ma_3month
FROM (
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(revenue) AS revenue,
AVG(SUM(revenue)) OVER (
ORDER BY DATE_TRUNC('month', order_date)
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS ma_3month
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
) monthly_revenue
ORDER BY month;
Performance Optimization
EXPLAIN
sql
-- Analyze query plan
EXPLAIN ANALYZE
SELECT
user_id,
COUNT(*) AS order_count,
SUM(revenue) AS total_revenue
FROM orders
GROUP BY user_id;
Indexes
sql
-- Create index on filter column
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- Create index on join column
CREATE INDEX idx_orders_product_id ON orders(product_id);
-- Create composite index
CREATE INDEX idx_orders_user_date ON orders(user_id, order_date);
-- Create index on date for range queries
CREATE INDEX idx_orders_date ON orders(order_date);
Avoid SELECT *
sql
-- Bad: Select all columns
SELECT * FROM orders;
-- Good: Select only needed columns
SELECT order_id, user_id, revenue, order_date
FROM orders;
Filter Early
sql
-- Bad: Filter after JOIN
SELECT
u.user_id,
COUNT(o.order_id) AS order_count
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE o.order_date >= '2024-01-01'
GROUP BY u.user_id;
-- Good: Filter before JOIN
SELECT
u.user_id,
COUNT(o.order_id) AS order_count
FROM users u
LEFT JOIN (
SELECT order_id, user_id
FROM orders
WHERE order_date >= '2024-01-01'
) o ON u.user_id = o.user_id
GROUP BY u.user_id;
Advanced Aggregations
PERCENTILE_CONT()
sql
-- Median and quartiles
SELECT
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY revenue) AS median_revenue,
PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY revenue) AS q25_revenue,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY revenue) AS q75_revenue
FROM orders;
STRING_AGG() / ARRAY_AGG()
sql
-- Concatenate values
SELECT
user_id,
STRING_AGG(product_name, ', ') AS products_purchased
FROM orders o
JOIN products p ON o.product_id = p.product_id
GROUP BY user_id;
-- Aggregate to array
SELECT
user_id,
ARRAY_AGG(product_id) AS product_ids
FROM orders
GROUP BY user_id;
FILTER Clause (PostgreSQL)
sql
-- Conditional aggregation
SELECT
DATE_TRUNC('month', order_date) AS month,
COUNT(*) AS total_orders,
COUNT(*) FILTER (WHERE status = 'completed') AS completed_orders,
COUNT(*) FILTER (WHERE status = 'cancelled') AS cancelled_orders,
SUM(revenue) FILTER (WHERE status = 'completed') AS completed_revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;
ROLLUP / CUBE
sql
-- Subtotals with ROLLUP
SELECT
country,
city,
SUM(revenue) AS total_revenue
FROM orders
GROUP BY ROLLUP(country, city)
ORDER BY country, city;
-- All combinations with CUBE
SELECT
country,
city,
product_category,
SUM(revenue) AS total_revenue
FROM orders
GROUP BY CUBE(country, city, product_category)
ORDER BY country, city, product_category;
Database-Specific Features
PostgreSQL
sql
-- JSON functions
SELECT
user_id,
metadata->>'country' AS country,
metadata->>'preferences'->>'theme' AS theme
FROM users;
-- LATERAL joins
SELECT
u.user_id,
o.order_id,
o.revenue
FROM users u
CROSS JOIN LATERAL (
SELECT order_id, revenue
FROM orders
WHERE user_id = u.user_id
ORDER BY order_date DESC
LIMIT 3
) o;
MySQL
sql
-- JSON functions
SELECT
user_id,
JSON_EXTRACT(metadata, '$.country') AS country
FROM users;
-- GROUP_CONCAT
SELECT
user_id,
GROUP_CONCAT(product_name ORDER BY product_name SEPARATOR ', ') AS products
FROM orders o
JOIN products p ON o.product_id = p.product_id
GROUP BY user_id;
BigQuery
sql
-- ARRAY and STRUCT
SELECT
user_id,
ARRAY(
SELECT AS STRUCT order_id, revenue
FROM orders
WHERE user_id = u.user_id
LIMIT 3
) AS recent_orders
FROM users u;
-- UNNEST
SELECT
user_id,
order.order_id,
order.revenue
FROM users u,
UNNEST(u.recent_orders) AS order
Snowflake
sql
-- QUALIFY clause (filter window functions)
SELECT
user_id,
order_date,
revenue,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date) AS order_rank
FROM orders
QUALIFY ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date) <= 3;
Testing Queries
Test with Sample Data
sql
-- Use LIMIT to test
SELECT
user_id,
COUNT(*) AS order_count,
SUM(revenue) AS total_revenue
FROM orders
GROUP BY user_id
LIMIT 10;
Validate Aggregations
sql
-- Spot check calculations
SELECT
COUNT(*) AS total_orders,
COUNT(DISTINCT user_id) AS unique_users,
SUM(revenue) AS total_revenue,
AVG(revenue) AS avg_revenue
FROM orders;
Check for NULLs
sql
-- Find NULL values
SELECT
COUNT(*) AS total,
COUNT(user_id) AS non_null_user_id,
COUNT(order_date) AS non_null_order_date,
COUNT(revenue) AS non_null_revenue
FROM orders;
Verify Date Ranges
sql
-- Check date range
SELECT
MIN(order_date) AS min_date,
MAX(order_date) AS max_date,
COUNT(*) AS total_records
FROM orders
WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01';
Query Organization
Formatting
sql
-- Consistent indentation and capitalization
SELECT
u.user_id,
u.signup_date,
COUNT(o.order_id) AS order_count,
SUM(o.revenue) AS total_revenue
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id
WHERE u.signup_date >= '2024-01-01'
GROUP BY u.user_id, u.signup_date
ORDER BY total_revenue DESC;
Comments
sql
-- Calculate monthly revenue by user
WITH monthly_user_revenue AS (
SELECT
user_id,
DATE_TRUNC('month', order_date) AS month,
SUM(revenue) AS monthly_revenue
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY user_id, DATE_TRUNC('month', order_date)
)
-- Calculate average monthly revenue per user
SELECT
user_id,
AVG(monthly_revenue) AS avg_monthly_revenue
FROM monthly_user_revenue
GROUP BY user_id
ORDER BY avg_monthly_revenue DESC;
Naming
sql
-- Descriptive CTE names
WITH user_orders AS (
-- User order summary
SELECT
user_id,
COUNT(*) AS order_count,
SUM(revenue) AS total_revenue
FROM orders
GROUP BY user_id
),
high_value_users AS (
-- Users with revenue > $1000
SELECT
user_id,
total_revenue
FROM user_orders
WHERE total_revenue > 1000
)
SELECT
h.user_id,
h.total_revenue,
u.country
FROM high_value_users h
JOIN users u ON h.user_id = u.user_id;
Real SQL Examples
Monthly Active Users (MAU)
sql
SELECT
DATE_TRUNC('month', event_date) AS month,
COUNT(DISTINCT user_id) AS mau
FROM events
WHERE event_type = 'page_view'
GROUP BY DATE_TRUNC('month', event_date)
ORDER BY month;
Cohort Retention
sql
WITH cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', signup_date) AS cohort_month
FROM users
),
user_activities AS (
SELECT
user_id,
DATE_TRUNC('month', event_date) AS activity_month
FROM events
WHERE event_type = 'page_view'
)
SELECT
c.cohort_month,
ua.activity_month,
COUNT(DISTINCT ua.user_id) AS retained_users,
COUNT(DISTINCT c.user_id) AS cohort_size,
100.0 * COUNT(DISTINCT ua.user_id) / COUNT(DISTINCT c.user_id) AS retention_pct
FROM cohorts c
LEFT JOIN user_activities ua ON c.user_id = ua.user_id
GROUP BY c.cohort_month, ua.activity_month
ORDER BY c.cohort_month, ua.activity_month;
Revenue by Product Category
sql
SELECT
p.category,
COUNT(o.order_id) AS order_count,
SUM(o.revenue) AS total_revenue,
AVG(o.revenue) AS avg_revenue
FROM orders o
JOIN products p ON o.product_id = p.product_id
WHERE o.order_date >= '2024-01-01'
GROUP BY p.category
ORDER BY total_revenue DESC;
Conversion Funnel
sql
WITH funnel_steps AS (
SELECT
session_id,
'visit' AS step,
1 AS step_number
FROM page_views
WHERE page_url = '/'
UNION ALL
SELECT
session_id,
'product_view' AS step,
2 AS step_number
FROM page_views
WHERE page_url LIKE '/product/%'
UNION ALL
SELECT
session_id,
'add_to_cart' AS step,
3 AS step_number
FROM events
WHERE event_type = 'add_to_cart'
UNION ALL
SELECT
session_id,
'purchase' AS step,
4 AS step_number
FROM events
WHERE event_type = 'purchase'
)
SELECT
step,
step_number,
COUNT(DISTINCT session_id) AS users,
LAG(COUNT(DISTINCT session_id)) OVER (ORDER BY step_number) AS previous_users,
ROUND(100.0 * COUNT(DISTINCT session_id) / LAG(COUNT(DISTINCT session_id)) OVER (ORDER BY step_number), 2) AS conversion_rate
FROM funnel_steps
GROUP BY step, step_number
ORDER BY step_number;
Running Total
sql
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(revenue) AS monthly_revenue,
SUM(SUM(revenue)) OVER (
ORDER BY DATE_TRUNC('month', order_date)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;
Year-Over-Year Growth
sql
WITH monthly_revenue AS (
SELECT
DATE_TRUNC('month', order_date) AS month,
EXTRACT(YEAR FROM order_date) AS year,
EXTRACT(MONTH FROM order_date) AS month_num,
SUM(revenue) AS revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
)
SELECT
year,
month_num,
revenue,
LAG(revenue, 12) OVER (ORDER BY year, month_num) AS revenue_prev_year,
ROUND(100.0 * (revenue - LAG(revenue, 12) OVER (ORDER BY year, month_num)) /
LAG(revenue, 12) OVER (ORDER BY year, month_num), 2) AS yoy_growth
FROM monthly_revenue
ORDER BY year, month_num;
Common Mistakes
Cartesian Product (Missing JOIN condition)
sql
-- Bad: No JOIN condition
SELECT
u.user_id,
o.order_id
FROM users u
CROSS JOIN orders o;
-- Good: Proper JOIN
SELECT
u.user_id,
o.order_id
FROM users u
JOIN orders o ON u.user_id = o.user_id;
NULL Handling
sql
-- Bad: NULL in arithmetic
SELECT
revenue - discount AS net_revenue
FROM orders;
-- Good: Handle NULL
SELECT
revenue - COALESCE(discount, 0) AS net_revenue
FROM orders;
GROUP BY Without Aggregation
sql
-- Bad: Column not in GROUP BY
SELECT
user_id,
order_date,
SUM(revenue) AS total_revenue
FROM orders
GROUP BY user_id;
-- Good: All non-aggregated columns in GROUP BY
SELECT
user_id,
SUM(revenue) AS total_revenue
FROM orders
GROUP BY user_id;
Division by Zero
sql
-- Bad: Division by zero error
SELECT
revenue / discount_rate AS discounted_revenue
FROM orders;
-- Good: Handle division by zero
SELECT
revenue / NULLIF(discount_rate, 0) AS discounted_revenue
FROM orders;
Quick Start
Common Analytics Queries
sql
-- Daily active users
SELECT
DATE(created_at) as date,
COUNT(DISTINCT user_id) as dau
FROM events
WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY DATE(created_at)
ORDER BY date DESC;
-- Revenue by product
SELECT
p.name,
SUM(oi.quantity * oi.price) as revenue,
COUNT(DISTINCT o.order_id) as orders
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE o.created_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY p.name
ORDER BY revenue DESC;
-- Cohort retention
WITH first_purchase AS (
SELECT
user_id,
MIN(created_at) as first_purchase_date
FROM orders
GROUP BY user_id
)
SELECT
DATE_TRUNC('month', fp.first_purchase_date) as cohort_month,
COUNT(DISTINCT fp.user_id) as cohort_size,
COUNT(DISTINCT o.user_id) as active_users
FROM first_purchase fp
LEFT JOIN orders o ON fp.user_id = o.user_id
AND DATE_TRUNC('month', o.created_at) = DATE_TRUNC('month', fp.first_purchase_date) + INTERVAL '1 month'
GROUP BY DATE_TRUNC('month', fp.first_purchase_date);
Production Checklist
- Query Optimization: Optimize queries for performance
- Indexes: Appropriate indexes for analytics queries
- Window Functions: Use window functions for rankings
- CTEs: Use CTEs for complex queries
- NULL Handling: Handle NULL values properly
- Data Quality: Validate data quality
- Documentation: Document query purpose and logic
- Testing: Test queries with sample data
- Performance: Monitor query performance
- Caching: Cache expensive queries
- Version Control: Version control SQL queries
- Review: Code review for SQL queries
Anti-patterns
❌ Don't: SELECT *
sql
-- ❌ Bad - Select all columns
SELECT * FROM users WHERE status = 'active'
sql
-- ✅ Good - Select only needed columns
SELECT user_id, email, created_at
FROM users
WHERE status = 'active'
❌ Don't: No Indexes
sql
-- ❌ Bad - Full table scan
SELECT * FROM orders WHERE created_at > '2024-01-01'
-- No index on created_at!
sql
-- ✅ Good - With index
CREATE INDEX idx_orders_created_at ON orders(created_at);
SELECT * FROM orders WHERE created_at > '2024-01-01'
❌ Don't: Ignore NULLs
sql
-- ❌ Bad - NULL handling
SELECT AVG(price) FROM products
-- NULLs included in count but not in sum!
sql
-- ✅ Good - Handle NULLs
SELECT AVG(COALESCE(price, 0)) FROM products
-- Or
SELECT AVG(price) FROM products WHERE price IS NOT NULL
Integration Points
- Dashboard Design (
23-business-analytics/dashboard-design/) - Query results visualization - KPI Metrics (
23-business-analytics/kpi-metrics/) - Metric queries - Database Optimization (
04-database/database-optimization/) - Query optimization
Further Reading
Performance
- Use EXPLAIN to analyze
- Create indexes on filter/join columns
- Avoid SELECT *
- Filter early
- Use appropriate JOIN types
Validation
- Spot check calculations
- Verify date ranges
- Check for NULLs
- Test edge cases
- Document queries
Didn't find tool you were looking for?