Agent skill

programmatic-seo

Programmatic page generation at scale using template-based SEO, data pipelines, and automated content production. Covers keyword pattern mining, template architecture, data sourcing, quality control, and indexation strategy for 100-100K+ page deployments.

View SKILL.md on GitHub Repository

Stars 71

Forks 21

Install this agent skill to your Project

npx add-skill https://github.com/borghei/Claude-Skills/tree/main/marketing/programmatic-seo

Metadata

Additional technical details for this skill

tags: seo programmatic templates content-at-scale data-driven-seo
author: borghei
updated: 1774915200
version: 1.0.0
category: marketing-growth

SKILL.md

Programmatic SEO

Production-grade framework for building SEO page sets at scale. Covers the full lifecycle from keyword pattern discovery through template design, data pipeline construction, quality assurance, and post-launch optimization. Designed for deployments ranging from 50 to 100,000+ pages.

When to Use vs When Not To
Initial Assessment
The 14 Playbooks
Playbook Selection Matrix
Keyword Pattern Mining
Data Pipeline Architecture
Template Design System
Quality Control Framework
Internal Linking Architecture
Indexation Strategy
Launch Sequence
Post-Launch Optimization
Anti-Patterns and Penalty Avoidance
Decision Matrix: Build vs Skip
Output Artifacts
Related Skills

When to Use vs When Not To

Use this skill when:

You have a repeating keyword pattern with 50+ variations
You have (or can acquire) structured data to populate pages
The search intent is consistent across variations
Your domain has sufficient authority to compete

Do NOT use when:

Each page requires unique editorial content (use content-creator instead)
Total addressable pages < 30 (manual content is more effective)
You lack a data source and would be generating thin placeholder content
Your domain authority is below DR 20 and competitors are DR 60+

Initial Assessment

Before designing any pSEO strategy, answer these questions. Skip nothing.

1. Opportunity Validation

Question	Why It Matters	Red Flag
What is the repeating keyword pattern?	Defines the template structure	Pattern is vague or inconsistent
What is the aggregate monthly search volume?	Determines ROI ceiling	< 5,000 aggregate monthly searches
How many unique pages can you generate?	Scope the project	< 50 pages (too few) or > 50K without data infrastructure
What does the SERP look like for sample queries?	Competitive feasibility	Page 1 dominated by DR 80+ editorial content
Is intent informational, navigational, or transactional?	Template design	Mixed intent across the same pattern

2. Data Source Evaluation

Rate your data source on this scale:

Tier	Source Type	Defensibility	Example
S	Proprietary first-party	Unbeatable	Your product usage data, internal benchmarks
A	Product-derived	Strong	Aggregated user analytics, customer outcomes
B	User-generated	Moderate	Community reviews, submitted content
C	Licensed exclusive	Moderate	Paid data feed no competitor has
D	Public aggregated	Weak	Government data, public APIs
F	Scraped commodity	None	Wikipedia rewrites, copied listings

Rule: Do not build pSEO on Tier F data. Google penalizes commodity rewrites. If your only data source is public and easily replicable, invest in acquiring Tier A-C data first.

3. Competitive Moat Assessment

For 5 sample queries in your pattern, analyze page 1 results:

What is the average Domain Rating of ranking pages?
Are existing results programmatic or editorial?
What unique data do ranking pages provide?
What is the content depth (word count, data richness, UX quality)?

Go/No-Go threshold: If the average DR gap between you and page 1 is > 30 AND existing results have proprietary data, the opportunity requires either a differentiated approach or domain authority building first.

The 14 Playbooks

#	Playbook	Pattern	Example	Data Requirement
1	Templates	"[Type] template"	"resume template", "invoice template"	Template files + metadata
2	Curation	"best [category]"	"best CRM for startups"	Product/service reviews + ratings
3	Conversions	"[X] to [Y]"	"100 USD to EUR"	Conversion logic/API
4	Comparisons	"[X] vs [Y]"	"Notion vs Confluence"	Feature data for both products
5	Examples	"[type] examples"	"landing page examples"	Curated example collection
6	Locations	"[service] in [city]"	"coworking in Austin"	Location-specific data
7	Personas	"[product] for [audience]"	"CRM for real estate"	Audience-specific use cases
8	Integrations	"[A] + [B] integration"	"Slack Asana integration"	Integration documentation
9	Glossary	"what is [term]"	"what is churn rate"	Domain expertise
10	Translations	Content in N languages	Localized guides	Translation + localization data
11	Directory	"[category] tools"	"AI writing tools"	Tool listings + evaluations
12	Profiles	"[entity name]"	"Stripe company profile"	Entity-level data
13	Statistics	"[topic] statistics"	"SaaS churn statistics 2026"	Verified statistical data
14	Calculators	"[topic] calculator"	"LTV calculator"	Calculation logic + inputs

Playbook Selection Matrix

If you have...	Primary Playbook	Secondary Layer
A product with many integrations	Integrations	Comparisons
A design/creative tool	Templates + Examples	Personas
A multi-segment audience	Personas	Comparisons
Local/regional presence	Locations	Directory
A tool/utility product	Calculators + Conversions	Glossary
Deep domain expertise	Glossary + Statistics	Curation
A competitor landscape to exploit	Comparisons + Curation	Directory
User-generated content	Examples + Directory	Profiles

Layering rule: Combine up to 2 playbooks per page set. Example: "Best coworking spaces in [city]" = Curation + Locations.

Keyword Pattern Mining

Step 1: Pattern Identification

Extract the repeating structure from seed keywords:

Seed: "react developer salary san francisco"
Pattern: [role] salary [city]
Variables: role (200+ options), city (500+ options)
Max pages: 200 x 500 = 100,000

Step 2: Volume Distribution Analysis

Not all variable combinations have search volume. Map the distribution:

Tier	Volume Range	Typical % of Total Pages	Strategy
Head	1,000+ monthly	2-5%	Priority indexation, highest content quality
Torso	100-999 monthly	15-25%	Standard template, full deployment
Long-tail	10-99 monthly	40-50%	Template with conditional content blocks
Zero-volume	< 10 monthly	20-40%	Noindex OR skip unless data is uniquely valuable

Step 3: Intent Classification

For each pattern, verify intent consistency:

Intent Type	Template Implications	CTA Strategy
Informational	Data-heavy, educational content	Newsletter, related content
Commercial investigation	Comparison tables, pros/cons	Free trial, demo
Transactional	Pricing, availability, features	Buy now, sign up
Navigational	Brand-specific, direct answer	Product page link

Data Pipeline Architecture

Pipeline Design

[Data Source] → [Extraction] → [Transformation] → [Enrichment] → [Validation] → [Template Population] → [Quality Check] → [Publish]

Data Quality Gates

Every record must pass these gates before page generation:

Gate	Check	Failure Action
Completeness	All required fields populated	Skip page, log for manual review
Accuracy	Data matches source, no staleness > 90 days	Flag for refresh
Uniqueness	No duplicate records	Merge or deduplicate
Minimum richness	Page will have > 300 words of unique content	Skip or enrich
Legal compliance	Data usage rights verified	Block publication

Update Cadence

Data Type	Recommended Update Frequency	Staleness Penalty
Pricing data	Weekly	High (users notice immediately)
Company/product data	Monthly	Medium
Statistical data	Quarterly	Low if year-tagged
Glossary/educational	Semi-annually	Very low
Location data	Monthly	Medium (closures, address changes)

Template Design System

Page Architecture

Every programmatic page must have these zones:

┌─────────────────────────────────────┐
│ Zone 1: Unique Header               │  H1 with target keyword, unique intro paragraph
├─────────────────────────────────────┤
│ Zone 2: Primary Data Section         │  The core data/content for this specific page
├─────────────────────────────────────┤
│ Zone 3: Contextual Analysis          │  Insights, comparisons, trends specific to this entity
├─────────────────────────────────────┤
│ Zone 4: Related Data                 │  Adjacent data points that add depth
├─────────────────────────────────────┤
│ Zone 5: Internal Navigation          │  Related pages, breadcrumbs, category links
├─────────────────────────────────────┤
│ Zone 6: CTA                         │  Conversion element matched to intent
└─────────────────────────────────────┘

Uniqueness Requirements

Each page MUST have at least 3 of these 5 uniqueness sources:

Unique data points -- Numbers, facts, or attributes specific to this entity
Conditional content blocks -- Sections that appear/disappear based on data attributes
Calculated insights -- Derived metrics (percentages, comparisons, rankings)
Contextual recommendations -- "If X, then Y" advice blocks based on the data
User-generated content -- Reviews, comments, or community contributions

URL Structure

Always use subfolders. Never subdomains for pSEO.

Pattern	URL Template	Example
Location	`/[service]/[city]/`	`/coworking/austin/`
Comparison	`/compare/[a]-vs-[b]/`	`/compare/notion-vs-confluence/`
Integration	`/integrations/[partner]/`	`/integrations/slack/`
Glossary	`/glossary/[term]/`	`/glossary/churn-rate/`
Persona	`/[product]-for-[audience]/`	`/crm-for-real-estate/`

Quality Control Framework

Pre-Publication QA Checklist

Content Quality:

Each page has > 300 words of unique content (not counting shared template elements)
H1 is unique and contains the target keyword
Meta title is unique (< 60 chars) and meta description is unique (< 155 chars)
No broken data references (empty fields rendered as "N/A" or blank)
At least 2 conditional content blocks triggered per page
No duplicate pages targeting the same keyword

Technical SEO:

Canonical tag points to self
Hreflang tags if multilingual
Schema markup renders without errors
Page loads in < 3 seconds
Mobile responsive

Internal Linking:

Breadcrumb trail is complete
3-5 related pages linked contextually
Hub page links to this page
No orphan pages in the set

Thin Content Detection

Run this check against every generated page:

Signal	Threshold	Action
Unique word count	< 200 unique words	Block publication
Content similarity to another page in set	> 80% Jaccard similarity	Merge or differentiate
Data fields populated	< 60% of template fields	Skip or enrich
User time-on-page (post-launch)	< 15 seconds average	Review and improve
Bounce rate (post-launch)	> 85%	Review intent match

Internal Linking Architecture

Hub-and-Spoke Model

                    ┌─────────┐
                    │  HUB    │  /coworking/
                    │  PAGE   │  (ranks for "coworking spaces")
                    └────┬────┘
          ┌──────────────┼──────────────┐
     ┌────┴────┐    ┌────┴────┐    ┌────┴────┐
     │ SPOKE 1 │    │ SPOKE 2 │    │ SPOKE 3 │
     │ /austin/│    │ /denver/│    │ /seattle/│
     └────┬────┘    └────┬────┘    └────┬────┘
          │              │              │
     Cross-links between related spokes

Linking rules:

Hub links DOWN to every spoke (or top 50 spokes if > 200 pages)
Every spoke links UP to the hub
Spokes link ACROSS to 3-5 related spokes (geographic proximity, thematic similarity)
Deep pages link UP to their spoke AND the hub
Cross-silo links only when contextually genuine

Pagination for Large Sets

If a hub page has > 50 spokes, implement paginated sub-hubs:

/coworking/                     → Top cities + browse by state
/coworking/california/          → All California cities
/coworking/california/page/2/   → Paginated if > 25 cities

Indexation Strategy

Crawl Budget Management

Page Set Size	Strategy
< 500 pages	Single XML sitemap, submit all
500-5,000	Segmented sitemaps by category
5,000-50,000	Segmented sitemaps + priority scoring + IndexNow
50,000+	Programmatic sitemap generation + crawl budget monitoring + strategic noindex

Indexation Priority

Priority	Pages	Action
P0	Hub pages	Submit immediately, internal link from homepage
P1	Head-volume spokes (top 10%)	Submit in first sitemap batch
P2	Torso-volume spokes	Submit in second batch, 1-2 weeks later
P3	Long-tail spokes	Submit gradually over 4-6 weeks
P4	Zero-volume pages	Noindex unless data is uniquely valuable

IndexNow Integration

For large-scale updates, use IndexNow to notify search engines immediately:

POST https://api.indexnow.org/indexnow
{
  "host": "yoursite.com",
  "key": "your-api-key",
  "urlList": ["https://yoursite.com/page1", "https://yoursite.com/page2"]
}

Launch Sequence

Phase 1: Pilot (Week 1-2)

Deploy 20-50 pages from head-volume tier
Submit sitemap with pilot pages only
Monitor indexation rate daily
Check for crawl errors in Search Console

Phase 2: Scale (Week 3-6)

Deploy remaining torso-volume pages in batches of 100-500
Add cross-links between deployed pages
Monitor thin content warnings
Track impressions in Search Console

Phase 3: Long-Tail (Week 7-12)

Deploy long-tail pages
Noindex zero-volume pages (keep them crawlable but not indexed)
Begin link acquisition outreach for hub pages

Phase 4: Optimization (Ongoing)

A/B test template variations on head-volume pages
Refresh stale data quarterly
Add conditional content blocks based on engagement data
Monitor for keyword cannibalization across the set

Post-Launch Optimization

Metrics Dashboard

Metric	Frequency	Target
Indexation rate	Weekly	> 90% of submitted pages indexed within 60 days
Organic impressions	Weekly	Trending up month-over-month
Average position (by tier)	Bi-weekly	Head pages: top 10; Torso: top 30
Click-through rate	Monthly	> 3% for head pages
Bounce rate	Monthly	< 70%
Conversion rate	Monthly	> 1% for transactional intent
Pages per session	Monthly	> 1.5

Optimization Playbook

Signal	Diagnosis	Action
Indexed but not ranking	Content quality or authority gap	Enrich content, build links to hub
Ranking but low CTR	Title/description not compelling	A/B test meta titles
Ranking but high bounce	Intent mismatch or thin content	Audit against search intent, add data
Deindexed after initial indexing	Thin content penalty	Improve uniqueness, reduce similarity
Crawled but not indexed	Quality threshold not met	Add more unique content per page

Anti-Patterns and Penalty Avoidance

Anti-Pattern	Why It Fails	Prevention
City-name swapping	Same content + different city = doorway page penalty	Each location page needs unique local data
Keyword stuffing in templates	Unnatural density triggers spam filters	Keep keyword density 1-2%, write naturally
Generating pages for zero-demand queries	Wastes crawl budget, signals low quality	Validate demand before generating
No internal links to pSEO pages	Orphan pages get deprioritized	Connect every page to the hub-spoke structure
Stale data never refreshed	Users lose trust, Google notices	Set update cadence per data type
All pages identical structure	Lack of variation signals automation to Google	Use 3-5 template variants

Decision Matrix: Build vs Skip

Score each dimension 1-5, then apply the threshold.

Dimension	Weight	1 (Skip)	5 (Build)
Search demand	30%	< 1K aggregate monthly	> 50K aggregate monthly
Data quality	25%	Public/scraped, easily replicated	Proprietary, defensible
Competitive gap	20%	DR gap > 40, strong incumbents	DR gap < 15, weak/no incumbents
Template feasibility	15%	Each page needs unique editorial	Clean template fits all variations
Business alignment	10%	No conversion path from these pages	Direct path to core product

Scoring guide:

4.0+ weighted average: Build immediately
3.0-3.9: Build if resources allow, validate with pilot first
2.0-2.9: Invest in data quality or authority first
< 2.0: Do not build

Output Artifacts

Artifact	Format	Description
Opportunity Analysis	Markdown table	Keyword patterns x volume x data source x difficulty x business alignment
Playbook Recommendation	Decision matrix	If/then mapping with rationale and real-world examples
Page Template Specification	Annotated wireframe (markdown)	URL pattern, zone structure, uniqueness sources, conditional logic
Data Pipeline Spec	Flow diagram (text)	Source > extraction > transformation > validation > publication
Quality Scorecard	Checklist + thresholds	Pre-publication QA gates with pass/fail criteria
Indexation Plan	Phased timeline	Priority tiers, sitemap structure, crawl budget allocation
Post-Launch Dashboard	Metric table	KPIs, targets, review cadence, optimization triggers

Related Skills

seo-audit -- Run after pSEO pages are live to diagnose indexation issues, thin content warnings, or ranking problems across the page set.
schema-markup -- Add structured data to pSEO templates (Product, FAQ, LocalBusiness) for rich snippet eligibility at scale.
site-architecture -- Plan hub-and-spoke structure and crawl budget management for large pSEO deployments (500+ pages).
competitor-alternatives -- Use the Comparisons playbook when building "[X] vs [Y]" pages; competitor-alternatives has dedicated comparison page frameworks.
content-creator -- Use when individual pages in the set need editorial-quality unique content beyond template generation.

Troubleshooting

Problem	Likely Cause	Fix
Google deindexed 90%+ of pSEO pages	Thin content — pages have insufficient unique content (<300 words) or >80% similarity	Increase unique content per page to 500+ words; ensure 30-40% differentiation between pages
Pages indexed but getting zero traffic	Pages target zero-volume keywords or content does not match search intent	Validate demand before generating; noindex zero-volume pages; verify intent alignment
"Doorway pages" manual action in GSC	Template pages with only variable substitution (city name swap) and no unique value	Add genuinely unique data per page — local stats, specific recommendations, conditional content blocks
Hub page ranks but spokes do not	Spokes missing inbound internal links or hub not linking down to spokes	Verify bidirectional hub-spoke linking; add contextual cross-links between related spokes
Crawl budget exhausted before all pages indexed	Too many pages submitted at once or low-value pages consuming crawl resources	Phase deployment in batches of 100-500; use tiered indexation with strategic noindex
Content similarity too high across page set	Template lacks conditional content blocks; only variable substitution used	Add 3-5 conditional content sections per template that change based on data attributes
AI content detection flagging pSEO pages	Over-reliance on AI generation without human editorial review	Use AI for data enrichment only, not full content generation; sample 5-10% for quality review

Success Criteria

Indexation rate: 90%+ of submitted pages indexed within 60 days of deployment
Content uniqueness: Every page has 500+ unique words with <40% similarity to any other page in the set (2026 Google threshold)
Head keyword rankings: Top 10% of pages (by volume) ranking in top 30 within 90 days
Organic traffic growth: Page set generating measurable organic traffic within 60 days of full deployment
Thin content rate: Zero pages flagged as thin content in Google Search Console
Bounce rate: Below 70% average across the page set (indicating intent match)
Conversion rate: 1%+ for transactional intent pages, measurable lead capture for informational pages

Scope & Limitations

In scope:

Keyword pattern mining and volume distribution analysis
Data pipeline design (source > extraction > transformation > validation > publication)
Template architecture with uniqueness requirements
Quality control frameworks including thin content detection
Hub-and-spoke internal linking for pSEO page sets
Phased indexation strategy and crawl budget management
Post-launch optimization and monitoring dashboards

Out of scope:

Individual editorial content creation (use Content Production)
Data collection or web scraping implementation
CMS or static site generator setup and configuration
Server infrastructure for large-scale deployments
Paid acquisition for pSEO pages
Legal compliance for data usage rights

Known limitations:

Google's 2026 helpful content system can deindex large page sets retroactively if quality drops below threshold
Programmatic SEO at Tier F data (public/scraped) carries high penalty risk regardless of template quality
Engagement metrics (bounce rate, time on page) now influence indexation decisions for pSEO pages
AI content detection is improving — fully automated content generation without human oversight is increasingly risky
Travel site case study: 50,000 city-swap pages had 98% deindexed within 3 months (per 2025 industry data)

Scripts

bash

# Analyze keyword patterns for pSEO opportunities
python scripts/keyword_pattern_miner.py --keywords keywords.csv --json

# Score page templates for content quality and uniqueness
python scripts/template_scorer.py --template template.html --data sample_data.json

# Validate data quality for pSEO data pipeline
python scripts/data_validator.py --file data.csv --rules rules.json --json

Maintainer

borghei Core maintainer

Source details

Full Name: borghei/Claude-Skills
Branch: main
Path in repo: marketing/programmatic-seo
License: Other
Topics: claude-code automation ai-agents cursor developer-tools agentic-coding github-copilot prompt-engineering llm python ai-coding-assistant ai-skills windsurf openai-codex compliance-automation eu-ai-act gdpr-compliance iso-27001 role-based-agents soc2

Featured Tools

Join Our Newsletter

Referral and affiliate program design covering referral loop architecture, incentive design, trigger moment optimization, viral coefficient modeling, affiliate program structure, and optimization playbook.

71 21

Explore

Didn't find tool you were looking for?

Search AI Tools