Agent skill
programmatic-seo
Programmatic page generation at scale using template-based SEO, data pipelines, and automated content production. Covers keyword pattern mining, template architecture, data sourcing, quality control, and indexation strategy for 100-100K+ page deployments.
Install this agent skill to your Project
npx add-skill https://github.com/borghei/Claude-Skills/tree/main/marketing/programmatic-seo
Metadata
Additional technical details for this skill
- tags
-
seo programmatic templates content-at-scale data-driven-seo
- author
- borghei
- updated
- 1774915200
- version
- 1.0.0
- category
- marketing-growth
SKILL.md
Programmatic SEO
Production-grade framework for building SEO page sets at scale. Covers the full lifecycle from keyword pattern discovery through template design, data pipeline construction, quality assurance, and post-launch optimization. Designed for deployments ranging from 50 to 100,000+ pages.
Table of Contents
- When to Use vs When Not To
- Initial Assessment
- The 14 Playbooks
- Playbook Selection Matrix
- Keyword Pattern Mining
- Data Pipeline Architecture
- Template Design System
- Quality Control Framework
- Internal Linking Architecture
- Indexation Strategy
- Launch Sequence
- Post-Launch Optimization
- Anti-Patterns and Penalty Avoidance
- Decision Matrix: Build vs Skip
- Output Artifacts
- Related Skills
When to Use vs When Not To
Use this skill when:
- You have a repeating keyword pattern with 50+ variations
- You have (or can acquire) structured data to populate pages
- The search intent is consistent across variations
- Your domain has sufficient authority to compete
Do NOT use when:
- Each page requires unique editorial content (use content-creator instead)
- Total addressable pages < 30 (manual content is more effective)
- You lack a data source and would be generating thin placeholder content
- Your domain authority is below DR 20 and competitors are DR 60+
Initial Assessment
Before designing any pSEO strategy, answer these questions. Skip nothing.
1. Opportunity Validation
| Question | Why It Matters | Red Flag |
|---|---|---|
| What is the repeating keyword pattern? | Defines the template structure | Pattern is vague or inconsistent |
| What is the aggregate monthly search volume? | Determines ROI ceiling | < 5,000 aggregate monthly searches |
| How many unique pages can you generate? | Scope the project | < 50 pages (too few) or > 50K without data infrastructure |
| What does the SERP look like for sample queries? | Competitive feasibility | Page 1 dominated by DR 80+ editorial content |
| Is intent informational, navigational, or transactional? | Template design | Mixed intent across the same pattern |
2. Data Source Evaluation
Rate your data source on this scale:
| Tier | Source Type | Defensibility | Example |
|---|---|---|---|
| S | Proprietary first-party | Unbeatable | Your product usage data, internal benchmarks |
| A | Product-derived | Strong | Aggregated user analytics, customer outcomes |
| B | User-generated | Moderate | Community reviews, submitted content |
| C | Licensed exclusive | Moderate | Paid data feed no competitor has |
| D | Public aggregated | Weak | Government data, public APIs |
| F | Scraped commodity | None | Wikipedia rewrites, copied listings |
Rule: Do not build pSEO on Tier F data. Google penalizes commodity rewrites. If your only data source is public and easily replicable, invest in acquiring Tier A-C data first.
3. Competitive Moat Assessment
For 5 sample queries in your pattern, analyze page 1 results:
- What is the average Domain Rating of ranking pages?
- Are existing results programmatic or editorial?
- What unique data do ranking pages provide?
- What is the content depth (word count, data richness, UX quality)?
Go/No-Go threshold: If the average DR gap between you and page 1 is > 30 AND existing results have proprietary data, the opportunity requires either a differentiated approach or domain authority building first.
The 14 Playbooks
| # | Playbook | Pattern | Example | Data Requirement |
|---|---|---|---|---|
| 1 | Templates | "[Type] template" | "resume template", "invoice template" | Template files + metadata |
| 2 | Curation | "best [category]" | "best CRM for startups" | Product/service reviews + ratings |
| 3 | Conversions | "[X] to [Y]" | "100 USD to EUR" | Conversion logic/API |
| 4 | Comparisons | "[X] vs [Y]" | "Notion vs Confluence" | Feature data for both products |
| 5 | Examples | "[type] examples" | "landing page examples" | Curated example collection |
| 6 | Locations | "[service] in [city]" | "coworking in Austin" | Location-specific data |
| 7 | Personas | "[product] for [audience]" | "CRM for real estate" | Audience-specific use cases |
| 8 | Integrations | "[A] + [B] integration" | "Slack Asana integration" | Integration documentation |
| 9 | Glossary | "what is [term]" | "what is churn rate" | Domain expertise |
| 10 | Translations | Content in N languages | Localized guides | Translation + localization data |
| 11 | Directory | "[category] tools" | "AI writing tools" | Tool listings + evaluations |
| 12 | Profiles | "[entity name]" | "Stripe company profile" | Entity-level data |
| 13 | Statistics | "[topic] statistics" | "SaaS churn statistics 2026" | Verified statistical data |
| 14 | Calculators | "[topic] calculator" | "LTV calculator" | Calculation logic + inputs |
Playbook Selection Matrix
| If you have... | Primary Playbook | Secondary Layer |
|---|---|---|
| A product with many integrations | Integrations | Comparisons |
| A design/creative tool | Templates + Examples | Personas |
| A multi-segment audience | Personas | Comparisons |
| Local/regional presence | Locations | Directory |
| A tool/utility product | Calculators + Conversions | Glossary |
| Deep domain expertise | Glossary + Statistics | Curation |
| A competitor landscape to exploit | Comparisons + Curation | Directory |
| User-generated content | Examples + Directory | Profiles |
Layering rule: Combine up to 2 playbooks per page set. Example: "Best coworking spaces in [city]" = Curation + Locations.
Keyword Pattern Mining
Step 1: Pattern Identification
Extract the repeating structure from seed keywords:
Seed: "react developer salary san francisco"
Pattern: [role] salary [city]
Variables: role (200+ options), city (500+ options)
Max pages: 200 x 500 = 100,000
Step 2: Volume Distribution Analysis
Not all variable combinations have search volume. Map the distribution:
| Tier | Volume Range | Typical % of Total Pages | Strategy |
|---|---|---|---|
| Head | 1,000+ monthly | 2-5% | Priority indexation, highest content quality |
| Torso | 100-999 monthly | 15-25% | Standard template, full deployment |
| Long-tail | 10-99 monthly | 40-50% | Template with conditional content blocks |
| Zero-volume | < 10 monthly | 20-40% | Noindex OR skip unless data is uniquely valuable |
Step 3: Intent Classification
For each pattern, verify intent consistency:
| Intent Type | Template Implications | CTA Strategy |
|---|---|---|
| Informational | Data-heavy, educational content | Newsletter, related content |
| Commercial investigation | Comparison tables, pros/cons | Free trial, demo |
| Transactional | Pricing, availability, features | Buy now, sign up |
| Navigational | Brand-specific, direct answer | Product page link |
Data Pipeline Architecture
Pipeline Design
[Data Source] → [Extraction] → [Transformation] → [Enrichment] → [Validation] → [Template Population] → [Quality Check] → [Publish]
Data Quality Gates
Every record must pass these gates before page generation:
| Gate | Check | Failure Action |
|---|---|---|
| Completeness | All required fields populated | Skip page, log for manual review |
| Accuracy | Data matches source, no staleness > 90 days | Flag for refresh |
| Uniqueness | No duplicate records | Merge or deduplicate |
| Minimum richness | Page will have > 300 words of unique content | Skip or enrich |
| Legal compliance | Data usage rights verified | Block publication |
Update Cadence
| Data Type | Recommended Update Frequency | Staleness Penalty |
|---|---|---|
| Pricing data | Weekly | High (users notice immediately) |
| Company/product data | Monthly | Medium |
| Statistical data | Quarterly | Low if year-tagged |
| Glossary/educational | Semi-annually | Very low |
| Location data | Monthly | Medium (closures, address changes) |
Template Design System
Page Architecture
Every programmatic page must have these zones:
┌─────────────────────────────────────┐
│ Zone 1: Unique Header │ H1 with target keyword, unique intro paragraph
├─────────────────────────────────────┤
│ Zone 2: Primary Data Section │ The core data/content for this specific page
├─────────────────────────────────────┤
│ Zone 3: Contextual Analysis │ Insights, comparisons, trends specific to this entity
├─────────────────────────────────────┤
│ Zone 4: Related Data │ Adjacent data points that add depth
├─────────────────────────────────────┤
│ Zone 5: Internal Navigation │ Related pages, breadcrumbs, category links
├─────────────────────────────────────┤
│ Zone 6: CTA │ Conversion element matched to intent
└─────────────────────────────────────┘
Uniqueness Requirements
Each page MUST have at least 3 of these 5 uniqueness sources:
- Unique data points -- Numbers, facts, or attributes specific to this entity
- Conditional content blocks -- Sections that appear/disappear based on data attributes
- Calculated insights -- Derived metrics (percentages, comparisons, rankings)
- Contextual recommendations -- "If X, then Y" advice blocks based on the data
- User-generated content -- Reviews, comments, or community contributions
URL Structure
Always use subfolders. Never subdomains for pSEO.
| Pattern | URL Template | Example |
|---|---|---|
| Location | /[service]/[city]/ |
/coworking/austin/ |
| Comparison | /compare/[a]-vs-[b]/ |
/compare/notion-vs-confluence/ |
| Integration | /integrations/[partner]/ |
/integrations/slack/ |
| Glossary | /glossary/[term]/ |
/glossary/churn-rate/ |
| Persona | /[product]-for-[audience]/ |
/crm-for-real-estate/ |
Quality Control Framework
Pre-Publication QA Checklist
Content Quality:
- Each page has > 300 words of unique content (not counting shared template elements)
- H1 is unique and contains the target keyword
- Meta title is unique (< 60 chars) and meta description is unique (< 155 chars)
- No broken data references (empty fields rendered as "N/A" or blank)
- At least 2 conditional content blocks triggered per page
- No duplicate pages targeting the same keyword
Technical SEO:
- Canonical tag points to self
- Hreflang tags if multilingual
- Schema markup renders without errors
- Page loads in < 3 seconds
- Mobile responsive
Internal Linking:
- Breadcrumb trail is complete
- 3-5 related pages linked contextually
- Hub page links to this page
- No orphan pages in the set
Thin Content Detection
Run this check against every generated page:
| Signal | Threshold | Action |
|---|---|---|
| Unique word count | < 200 unique words | Block publication |
| Content similarity to another page in set | > 80% Jaccard similarity | Merge or differentiate |
| Data fields populated | < 60% of template fields | Skip or enrich |
| User time-on-page (post-launch) | < 15 seconds average | Review and improve |
| Bounce rate (post-launch) | > 85% | Review intent match |
Internal Linking Architecture
Hub-and-Spoke Model
┌─────────┐
│ HUB │ /coworking/
│ PAGE │ (ranks for "coworking spaces")
└────┬────┘
┌──────────────┼──────────────┐
┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ SPOKE 1 │ │ SPOKE 2 │ │ SPOKE 3 │
│ /austin/│ │ /denver/│ │ /seattle/│
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
Cross-links between related spokes
Linking rules:
- Hub links DOWN to every spoke (or top 50 spokes if > 200 pages)
- Every spoke links UP to the hub
- Spokes link ACROSS to 3-5 related spokes (geographic proximity, thematic similarity)
- Deep pages link UP to their spoke AND the hub
- Cross-silo links only when contextually genuine
Pagination for Large Sets
If a hub page has > 50 spokes, implement paginated sub-hubs:
/coworking/ → Top cities + browse by state
/coworking/california/ → All California cities
/coworking/california/page/2/ → Paginated if > 25 cities
Indexation Strategy
Crawl Budget Management
| Page Set Size | Strategy |
|---|---|
| < 500 pages | Single XML sitemap, submit all |
| 500-5,000 | Segmented sitemaps by category |
| 5,000-50,000 | Segmented sitemaps + priority scoring + IndexNow |
| 50,000+ | Programmatic sitemap generation + crawl budget monitoring + strategic noindex |
Indexation Priority
| Priority | Pages | Action |
|---|---|---|
| P0 | Hub pages | Submit immediately, internal link from homepage |
| P1 | Head-volume spokes (top 10%) | Submit in first sitemap batch |
| P2 | Torso-volume spokes | Submit in second batch, 1-2 weeks later |
| P3 | Long-tail spokes | Submit gradually over 4-6 weeks |
| P4 | Zero-volume pages | Noindex unless data is uniquely valuable |
IndexNow Integration
For large-scale updates, use IndexNow to notify search engines immediately:
POST https://api.indexnow.org/indexnow
{
"host": "yoursite.com",
"key": "your-api-key",
"urlList": ["https://yoursite.com/page1", "https://yoursite.com/page2"]
}
Launch Sequence
Phase 1: Pilot (Week 1-2)
- Deploy 20-50 pages from head-volume tier
- Submit sitemap with pilot pages only
- Monitor indexation rate daily
- Check for crawl errors in Search Console
Phase 2: Scale (Week 3-6)
- Deploy remaining torso-volume pages in batches of 100-500
- Add cross-links between deployed pages
- Monitor thin content warnings
- Track impressions in Search Console
Phase 3: Long-Tail (Week 7-12)
- Deploy long-tail pages
- Noindex zero-volume pages (keep them crawlable but not indexed)
- Begin link acquisition outreach for hub pages
Phase 4: Optimization (Ongoing)
- A/B test template variations on head-volume pages
- Refresh stale data quarterly
- Add conditional content blocks based on engagement data
- Monitor for keyword cannibalization across the set
Post-Launch Optimization
Metrics Dashboard
| Metric | Frequency | Target |
|---|---|---|
| Indexation rate | Weekly | > 90% of submitted pages indexed within 60 days |
| Organic impressions | Weekly | Trending up month-over-month |
| Average position (by tier) | Bi-weekly | Head pages: top 10; Torso: top 30 |
| Click-through rate | Monthly | > 3% for head pages |
| Bounce rate | Monthly | < 70% |
| Conversion rate | Monthly | > 1% for transactional intent |
| Pages per session | Monthly | > 1.5 |
Optimization Playbook
| Signal | Diagnosis | Action |
|---|---|---|
| Indexed but not ranking | Content quality or authority gap | Enrich content, build links to hub |
| Ranking but low CTR | Title/description not compelling | A/B test meta titles |
| Ranking but high bounce | Intent mismatch or thin content | Audit against search intent, add data |
| Deindexed after initial indexing | Thin content penalty | Improve uniqueness, reduce similarity |
| Crawled but not indexed | Quality threshold not met | Add more unique content per page |
Anti-Patterns and Penalty Avoidance
| Anti-Pattern | Why It Fails | Prevention |
|---|---|---|
| City-name swapping | Same content + different city = doorway page penalty | Each location page needs unique local data |
| Keyword stuffing in templates | Unnatural density triggers spam filters | Keep keyword density 1-2%, write naturally |
| Generating pages for zero-demand queries | Wastes crawl budget, signals low quality | Validate demand before generating |
| No internal links to pSEO pages | Orphan pages get deprioritized | Connect every page to the hub-spoke structure |
| Stale data never refreshed | Users lose trust, Google notices | Set update cadence per data type |
| All pages identical structure | Lack of variation signals automation to Google | Use 3-5 template variants |
Decision Matrix: Build vs Skip
Score each dimension 1-5, then apply the threshold.
| Dimension | Weight | 1 (Skip) | 5 (Build) |
|---|---|---|---|
| Search demand | 30% | < 1K aggregate monthly | > 50K aggregate monthly |
| Data quality | 25% | Public/scraped, easily replicated | Proprietary, defensible |
| Competitive gap | 20% | DR gap > 40, strong incumbents | DR gap < 15, weak/no incumbents |
| Template feasibility | 15% | Each page needs unique editorial | Clean template fits all variations |
| Business alignment | 10% | No conversion path from these pages | Direct path to core product |
Scoring guide:
- 4.0+ weighted average: Build immediately
- 3.0-3.9: Build if resources allow, validate with pilot first
- 2.0-2.9: Invest in data quality or authority first
- < 2.0: Do not build
Output Artifacts
| Artifact | Format | Description |
|---|---|---|
| Opportunity Analysis | Markdown table | Keyword patterns x volume x data source x difficulty x business alignment |
| Playbook Recommendation | Decision matrix | If/then mapping with rationale and real-world examples |
| Page Template Specification | Annotated wireframe (markdown) | URL pattern, zone structure, uniqueness sources, conditional logic |
| Data Pipeline Spec | Flow diagram (text) | Source > extraction > transformation > validation > publication |
| Quality Scorecard | Checklist + thresholds | Pre-publication QA gates with pass/fail criteria |
| Indexation Plan | Phased timeline | Priority tiers, sitemap structure, crawl budget allocation |
| Post-Launch Dashboard | Metric table | KPIs, targets, review cadence, optimization triggers |
Related Skills
- seo-audit -- Run after pSEO pages are live to diagnose indexation issues, thin content warnings, or ranking problems across the page set.
- schema-markup -- Add structured data to pSEO templates (Product, FAQ, LocalBusiness) for rich snippet eligibility at scale.
- site-architecture -- Plan hub-and-spoke structure and crawl budget management for large pSEO deployments (500+ pages).
- competitor-alternatives -- Use the Comparisons playbook when building "[X] vs [Y]" pages; competitor-alternatives has dedicated comparison page frameworks.
- content-creator -- Use when individual pages in the set need editorial-quality unique content beyond template generation.
Troubleshooting
| Problem | Likely Cause | Fix |
|---|---|---|
| Google deindexed 90%+ of pSEO pages | Thin content — pages have insufficient unique content (<300 words) or >80% similarity | Increase unique content per page to 500+ words; ensure 30-40% differentiation between pages |
| Pages indexed but getting zero traffic | Pages target zero-volume keywords or content does not match search intent | Validate demand before generating; noindex zero-volume pages; verify intent alignment |
| "Doorway pages" manual action in GSC | Template pages with only variable substitution (city name swap) and no unique value | Add genuinely unique data per page — local stats, specific recommendations, conditional content blocks |
| Hub page ranks but spokes do not | Spokes missing inbound internal links or hub not linking down to spokes | Verify bidirectional hub-spoke linking; add contextual cross-links between related spokes |
| Crawl budget exhausted before all pages indexed | Too many pages submitted at once or low-value pages consuming crawl resources | Phase deployment in batches of 100-500; use tiered indexation with strategic noindex |
| Content similarity too high across page set | Template lacks conditional content blocks; only variable substitution used | Add 3-5 conditional content sections per template that change based on data attributes |
| AI content detection flagging pSEO pages | Over-reliance on AI generation without human editorial review | Use AI for data enrichment only, not full content generation; sample 5-10% for quality review |
Success Criteria
- Indexation rate: 90%+ of submitted pages indexed within 60 days of deployment
- Content uniqueness: Every page has 500+ unique words with <40% similarity to any other page in the set (2026 Google threshold)
- Head keyword rankings: Top 10% of pages (by volume) ranking in top 30 within 90 days
- Organic traffic growth: Page set generating measurable organic traffic within 60 days of full deployment
- Thin content rate: Zero pages flagged as thin content in Google Search Console
- Bounce rate: Below 70% average across the page set (indicating intent match)
- Conversion rate: 1%+ for transactional intent pages, measurable lead capture for informational pages
Scope & Limitations
In scope:
- Keyword pattern mining and volume distribution analysis
- Data pipeline design (source > extraction > transformation > validation > publication)
- Template architecture with uniqueness requirements
- Quality control frameworks including thin content detection
- Hub-and-spoke internal linking for pSEO page sets
- Phased indexation strategy and crawl budget management
- Post-launch optimization and monitoring dashboards
Out of scope:
- Individual editorial content creation (use Content Production)
- Data collection or web scraping implementation
- CMS or static site generator setup and configuration
- Server infrastructure for large-scale deployments
- Paid acquisition for pSEO pages
- Legal compliance for data usage rights
Known limitations:
- Google's 2026 helpful content system can deindex large page sets retroactively if quality drops below threshold
- Programmatic SEO at Tier F data (public/scraped) carries high penalty risk regardless of template quality
- Engagement metrics (bounce rate, time on page) now influence indexation decisions for pSEO pages
- AI content detection is improving — fully automated content generation without human oversight is increasingly risky
- Travel site case study: 50,000 city-swap pages had 98% deindexed within 3 months (per 2025 industry data)
Scripts
# Analyze keyword patterns for pSEO opportunities
python scripts/keyword_pattern_miner.py --keywords keywords.csv --json
# Score page templates for content quality and uniqueness
python scripts/template_scorer.py --template template.html --data sample_data.json
# Validate data quality for pSEO data pipeline
python scripts/data_validator.py --file data.csv --rules rules.json --json
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
churn-prevention
SaaS churn reduction covering cancel flow design, dynamic save offers, exit survey architecture, dunning sequences, payment recovery, win-back campaigns, and churn impact modeling.
popup-cro
Popup and modal optimization for conversion. Covers exit-intent, slide-ins, banners, timing optimization, frequency capping, audience targeting, compliance, and A/B testing frameworks for lead capture, promotions, and announcements.
competitor-alternatives
Competitor comparison and alternative page creation for SEO and sales enablement. Covers 4 page formats (singular alternative, plural alternatives, vs pages, competitor vs competitor), content architecture, research methodology, and centralized competitor data management.
contract-and-proposal-writer
Generate production-ready business documents including freelance contracts, project proposals, SOWs, NDAs, and MSAs with jurisdiction-aware clauses. Covers US (Delaware), EU (GDPR), UK, and DACH (German law) legal frameworks. Includes contract templates, clause libraries, and DOCX conversion. Use when starting client engagements, writing proposals, drafting partnership agreements, or needing GDPR-compliant data processing addenda.
pricing-strategy
SaaS pricing design and optimization covering value metric selection, tier architecture, price point research, pricing page design, price increase execution, and competitive pricing analysis.
referral-program
Referral and affiliate program design covering referral loop architecture, incentive design, trigger moment optimization, viral coefficient modeling, affiliate program structure, and optimization playbook.
Didn't find tool you were looking for?