Agent skills
data-extraction-patterns

Agent skill

data-extraction-patterns

Common patterns for extracting analytics data from GA4 and GSC with API handling

View SKILL.md on GitHub Repository

Stars 248

Forks 27

Install this agent skill to your Project

npx add-skill https://github.com/MadAppGang/claude-code/tree/main/plugins/seo/skills/data-extraction-patterns

SKILL.md

plugin: seo updated: 2026-01-20

Data Extraction Patterns

When to Use

Setting up analytics data pipelines
Combining data from multiple sources
Handling API rate limits and errors
Caching frequently accessed data
Building data collection workflows

API Reference

Google Analytics 4 (GA4)

MCP Server: mcp-server-google-analytics

Key Operations:

get_report({
  propertyId: "properties/123456789",
  dateRange: { startDate: "30daysAgo", endDate: "today" },
  dimensions: ["pagePath", "date"],
  metrics: ["screenPageViews", "averageSessionDuration", "bounceRate"]
})

Useful Metrics:

Metric	Description	Use Case
screenPageViews	Total page views	Traffic volume
sessions	User sessions	Visitor count
averageSessionDuration	Avg time in session	Engagement
bounceRate	Single-page visits	Content quality
engagementRate	Engaged sessions %	True engagement
scrolledUsers	Users who scrolled	Content consumption

Useful Dimensions:

Dimension	Description
pagePath	URL path
date	Date (for trending)
sessionSource	Traffic source
deviceCategory	Desktop/mobile/tablet

Google Search Console (GSC)

MCP Server: mcp-server-gsc

Key Operations:

search_analytics({
  siteUrl: "https://example.com",
  startDate: "2025-11-27",
  endDate: "2025-12-27",
  dimensions: ["query", "page"],
  rowLimit: 1000
})

get_url_inspection({
  siteUrl: "https://example.com",
  inspectionUrl: "https://example.com/page"
})

Available Metrics:

Metric	Description	Use Case
clicks	Total clicks from search	Traffic from Google
impressions	Times shown in results	Visibility
ctr	Click-through rate	Snippet effectiveness
position	Average ranking	SEO success

Dimensions:

Dimension	Description
query	Search query
page	Landing page URL
country	User country
device	Desktop/mobile/tablet
date	Date (for trending)

Parallel Execution Pattern

Optimal Data Fetch (All Sources)

markdown

## Parallel Data Fetch Pattern

When fetching from multiple sources, issue all requests in a SINGLE message
for parallel execution:

┌─────────────────────────────────────────────────────────────────┐
│  MESSAGE 1: Parallel Data Requests                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  [MCP Call 1]: google-analytics.get_report(...)                 │
│  [MCP Call 2]: google-search-console.search_analytics(...)      │
│                                                                  │
│  → All execute simultaneously                                    │
│  → Results return when all complete                              │
│  → ~2x faster than sequential                                    │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Sequential (When Needed)

Some operations require sequential execution:

markdown

## Sequential Pattern (Dependencies)

When one request depends on another's result:

┌─────────────────────────────────────────────────────────────────┐
│  MESSAGE 1: Get list of pages                                   │
│  → Returns: ["/page1", "/page2", "/page3"]                      │
├─────────────────────────────────────────────────────────────────┤
│  MESSAGE 2: Get details for each page                           │
│  → Uses page list from Message 1                                │
│  → Can parallelize within this message                          │
└─────────────────────────────────────────────────────────────────┘

Rate Limiting

API Rate Limits

API	Limit	Strategy
GA4	10 QPS per property	Batch dimensions
GSC	1,200 requests/min	Paginate large exports

Retry Pattern

bash

#!/bin/bash
# Retry with exponential backoff

MAX_RETRIES=3
RETRY_DELAY=5

fetch_with_retry() {
    local url="$1"
    local attempt=1

    while [ $attempt -le $MAX_RETRIES ]; do
        response=$(curl -s -w "%{http_code}" -o /tmp/response.json "$url")
        http_code="${response: -3}"

        if [ "$http_code" = "200" ]; then
            cat /tmp/response.json
            return 0
        elif [ "$http_code" = "429" ]; then
            echo "Rate limited, waiting ${RETRY_DELAY}s..." >&2
            sleep $RETRY_DELAY
            RETRY_DELAY=$((RETRY_DELAY * 2))
        else
            echo "Error: HTTP $http_code" >&2
            return 1
        fi

        attempt=$((attempt + 1))
    done

    echo "Max retries exceeded" >&2
    return 1
}

Caching Pattern

Session-Based Cache

bash

# Cache structure
SESSION_PATH="/tmp/seo-performance-20251227-143000-example"
CACHE_DIR="${SESSION_PATH}/cache"
CACHE_TTL=3600  # 1 hour in seconds

mkdir -p "$CACHE_DIR"

# Cache key generation
cache_key() {
    echo "$1" | md5sum | cut -d' ' -f1
}

# Check cache
get_cached() {
    local key=$(cache_key "$1")
    local cache_file="${CACHE_DIR}/${key}.json"

    if [ -f "$cache_file" ]; then
        local age=$(($(date +%s) - $(stat -f%m "$cache_file" 2>/dev/null || stat -c%Y "$cache_file")))
        if [ $age -lt $CACHE_TTL ]; then
            cat "$cache_file"
            return 0
        fi
    fi
    return 1
}

# Save to cache
save_cache() {
    local key=$(cache_key "$1")
    local cache_file="${CACHE_DIR}/${key}.json"
    cat > "$cache_file"
}

# Usage
CACHE_KEY="ga4_${URL}_${DATE_RANGE}"
if ! RESULT=$(get_cached "$CACHE_KEY"); then
    RESULT=$(fetch_from_api)
    echo "$RESULT" | save_cache "$CACHE_KEY"
fi

Date Range Standardization

Common Date Ranges

bash

# Standard date range calculations
TODAY=$(date +%Y-%m-%d)

case "$RANGE" in
    "7d")
        START_DATE=$(date -v-7d +%Y-%m-%d 2>/dev/null || date -d "7 days ago" +%Y-%m-%d)
        ;;
    "30d")
        START_DATE=$(date -v-30d +%Y-%m-%d 2>/dev/null || date -d "30 days ago" +%Y-%m-%d)
        ;;
    "90d")
        START_DATE=$(date -v-90d +%Y-%m-%d 2>/dev/null || date -d "90 days ago" +%Y-%m-%d)
        ;;
    "mtd")
        START_DATE=$(date +%Y-%m-01)
        ;;
    "ytd")
        START_DATE=$(date +%Y-01-01)
        ;;
esac

END_DATE="$TODAY"

API-Specific Formats

API	Format	Example
GA4	Relative or ISO	"30daysAgo", "2025-12-01"
GSC	ISO 8601	"2025-12-01"

Graceful Degradation

Data Source Fallback

markdown

## Fallback Strategy

When a data source is unavailable:

┌─────────────────────────────────────────────────────────────────┐
│  PRIMARY SOURCE      │  FALLBACK           │  LAST RESORT       │
├──────────────────────┼─────────────────────┼────────────────────┤
│  GA4 traffic data    │  GSC clicks         │  Estimate from GSC │
│  GSC search perf     │  Manual SERP check  │  WebSearch SERP    │
│  CWV (CrUX)          │  PageSpeed API      │  Lighthouse CLI    │
└──────────────────────┴─────────────────────┴────────────────────┘

Partial Data Output

markdown

## Analysis Report (Partial Data)

### Data Availability

| Source | Status | Impact |
|--------|--------|--------|
| GA4 | NOT CONFIGURED | Missing engagement metrics |
| GSC | AVAILABLE | Full search data |

### Analysis Notes

This analysis is based on limited data sources:
- Search performance metrics are complete (GSC)
- Engagement metrics unavailable (no GA4)

**Recommendation**: Configure GA4 for complete analysis.
Run `/setup-analytics` to add Google Analytics.

Unified Data Model

Combined Output Structure

json

{
  "metadata": {
    "url": "https://example.com/page",
    "fetchedAt": "2025-12-27T14:30:00Z",
    "dateRange": {
      "start": "2025-11-27",
      "end": "2025-12-27"
    }
  },
  "sources": {
    "ga4": {
      "available": true,
      "metrics": {
        "pageViews": 2450,
        "avgTimeOnPage": 222,
        "bounceRate": 38.2,
        "engagementRate": 64.5
      }
    },
    "gsc": {
      "available": true,
      "metrics": {
        "impressions": 15200,
        "clicks": 428,
        "ctr": 2.82,
        "avgPosition": 4.2
      },
      "topQueries": [
        {"query": "seo guide", "clicks": 156, "position": 4}
      ]
    }
  },
  "computed": {
    "healthScore": 72,
    "status": "GOOD"
  }
}

Error Handling

Common Errors

Error	Cause	Resolution
401 Unauthorized	Invalid/expired credentials	Re-run /setup-analytics
403 Forbidden	Missing permissions	Check API access in console
429 Too Many Requests	Rate limit	Wait and retry with backoff
404 Not Found	Invalid property/site	Verify IDs in configuration
500 Server Error	API issue	Retry later, check status page

Error Output Pattern

markdown

## Data Fetch Error

**Source**: Google Analytics 4
**Error**: 403 Forbidden
**Message**: "User does not have sufficient permissions for this property"

### Troubleshooting Steps

1. Verify Service Account email in GA4 Admin
2. Ensure "Viewer" role is granted
3. Check Analytics Data API is enabled
4. Wait 5 minutes for permission propagation

### Workaround

Proceeding with available data sources (GSC).
GA4 engagement metrics will not be included in this analysis.

Maintainer

MadAppGang Core maintainer

Source details

Full Name: MadAppGang/claude-code
Branch: main
Path in repo: plugins/seo/skills/data-extraction-patterns
License: MIT License

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

MadAppGang/claude-code

test-skill

A test skill for validation testing. Use when testing skill parsing and validation logic.

248 27

Explore

MadAppGang/claude-code

bad-skill

248 27

Explore

MadAppGang/claude-code

claudish-usage

CRITICAL - Guide for using Claudish CLI ONLY through sub-agents to run Claude Code with OpenRouter models (Grok, GPT-5, Gemini, MiniMax). NEVER run Claudish directly in main context unless user explicitly requests it. Use when user mentions external AI models, Claudish, OpenRouter, or alternative models. Includes mandatory sub-agent delegation patterns, agent selection guide, file-based instructions, and strict rules to prevent context window pollution.

248 27

Explore

MadAppGang/claude-code

release

Plugin release process for MAG Claude Plugins marketplace. Covers version bumping, marketplace.json updates, git tagging, and common mistakes. Use when releasing new plugin versions or troubleshooting update issues.

248 27

Explore

MadAppGang/claude-code

claudish-integration

248 27

Explore

MadAppGang/claude-code

openrouter-trending-models

Fetch trending programming models from OpenRouter rankings. Use when selecting models for multi-model review, updating model recommendations, or researching current AI coding trends. Provides model IDs, context windows, pricing, and usage statistics from the most recent week.

248 27

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Data Extraction Patterns

When to Use

API Reference

Google Analytics 4 (GA4)

Google Search Console (GSC)

Parallel Execution Pattern

Optimal Data Fetch (All Sources)

Sequential (When Needed)

Rate Limiting

API Rate Limits

Retry Pattern

Caching Pattern

Session-Based Cache

Date Range Standardization

Common Date Ranges

API-Specific Formats

Graceful Degradation

Data Source Fallback

Partial Data Output

Unified Data Model

Combined Output Structure

Error Handling

Common Errors

Error Output Pattern

Recommended Agent Skills

test-skill

bad-skill

claudish-usage

release

claudish-integration

openrouter-trending-models