Agent skill
seo-checklist
SEO first principles for PolicyEngine web applications - meta tags, crawlability, performance, and dual-mode (standalone + iframe) considerations
Install this agent skill to your Project
npx add-skill https://github.com/PolicyEngine/policyengine-claude/tree/main/skills/technical-patterns/seo-checklist-skill
SKILL.md
SEO First Principles for PolicyEngine Web Apps
Use this skill when auditing or building web applications that need to be discoverable via search engines. PolicyEngine apps are typically React SPAs deployed to GitHub Pages, often served both standalone and embedded as iframes in policyengine.org research pages.
How Search Engines Work
Google does three things:
- Crawl — Googlebot fetches your URL and downloads the raw HTML response
- Index — It reads that HTML, understands what the page is about, stores it
- Rank — When someone searches, it picks the best matching pages from its index
Your job is to make all three steps easy. If any step fails, your page won't appear in search results.
Principle 1: Google Reads HTML, Not Your Screen
The most critical issue for React SPAs. When Googlebot visits a client-side rendered app, it sees:
<div id="root"></div>
All content generated by JavaScript may or may not be indexed. Google can execute JS but:
- Pages enter a "render queue" (delays of hours to days)
- JS errors = no content indexed
- Google deprioritizes JS-rendered content vs static HTML
Test: Run curl -s YOUR_URL | grep -c '<h1>' — if the result is 0, Google likely can't see your content.
Solutions (ranked by effectiveness):
| Approach | Description | Effort |
|---|---|---|
| SSR (Next.js, Remix) | Server renders full HTML on each request | High (framework change) |
| SSG (Static Site Generation) | Pre-build HTML at deploy time | Medium |
| Pre-rendering | Render SPA to static HTML for crawlers | Low-Medium |
| Meta tags only | At minimum, add static meta tags to index.html | Low |
For PolicyEngine calculator apps, pre-rendering or SSG is the sweet spot. The form/landing page is static content; only results are dynamic.
Principle 2: One URL = One Page = One Topic
Google ranks pages, not websites. Each URL you want to rank for needs:
- Its own distinct URL path
- Unique title and description
- Content relevant to that specific topic
Hash Routing is Invisible to SEO
https://example.com/#country=us®ion=CA&head=45000
Google treats everything after # as the same page. All hash variations = one URL = one indexed page.
Path-Based URLs Are Crawlable
https://example.com/us/california?head=45000
Query parameters (?key=value) ARE seen by Google (though they may be treated as variants). Path segments (/us/california) are treated as distinct pages.
Principle 3: The Standalone vs Iframe Dual-Mode Problem
PolicyEngine apps often run in two modes:
- Standalone — Deployed on GitHub Pages (e.g.,
policyengine.github.io/us-marriage-incentive/) - Embedded — Iframed inside policyengine.org research pages (e.g.,
policyengine.org/us/research/marriage)
SEO Implications
| Concern | Standalone | Embedded (iframe) |
|---|---|---|
| Indexed by Google? | Yes (if crawlable) | No — Google indexes the parent page, not iframe content |
| Needs meta tags? | Yes — this is the version Google sees | No — parent page provides meta tags |
| Needs canonical URL? | Yes — should point to itself OR the parent page | N/A |
| Needs robots.txt? | Yes | N/A (inherits from parent domain) |
| Needs sitemap? | Yes | N/A (parent sitemap covers parent pages) |
Canonical URL Strategy
If the primary audience should find the app via policyengine.org:
<link rel="canonical" href="https://policyengine.org/us/research/marriage">
This tells Google: "The real version of this content lives on policyengine.org. Index that one."
If the standalone version is the primary:
<link rel="canonical" href="https://policyengine.github.io/us-marriage-incentive/">
Rule: Every page needs exactly one canonical URL. Without it, Google may index both versions and split your ranking power between them (called "duplicate content dilution").
Detecting Iframe Mode in Code
Most PolicyEngine apps already detect this:
const isEmbedded = window.self !== window.top;
SEO-relevant behavior should NOT depend on this check — meta tags, titles, and structured data must be present in the static HTML regardless of runtime mode.
Principle 4: Required Meta Tags
Every PolicyEngine web app needs these in index.html:
Critical (must have)
<!-- Basic SEO -->
<title>US Marriage Tax Calculator — Marriage Penalty & Bonus | PolicyEngine</title>
<meta name="description" content="Calculate how marriage affects your taxes and government benefits. See your marriage penalty or bonus across income levels for any US state.">
<link rel="canonical" href="https://policyengine.github.io/us-marriage-incentive/">
<!-- Open Graph (Facebook, LinkedIn, Slack, iMessage previews) -->
<meta property="og:type" content="website">
<meta property="og:title" content="US Marriage Tax Calculator">
<meta property="og:description" content="Calculate how marriage affects your taxes and government benefits.">
<meta property="og:image" content="https://policyengine.github.io/us-marriage-incentive/og-image.png">
<meta property="og:url" content="https://policyengine.github.io/us-marriage-incentive/">
<meta property="og:site_name" content="PolicyEngine">
<!-- Twitter / X -->
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="US Marriage Tax Calculator">
<meta name="twitter:description" content="Calculate how marriage affects your taxes and government benefits.">
<meta name="twitter:image" content="https://policyengine.github.io/us-marriage-incentive/og-image.png">
Important (should have)
<!-- Structured Data (JSON-LD) -->
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "WebApplication",
"name": "US Marriage Tax Calculator",
"description": "Calculate how marriage affects your taxes and government benefits.",
"url": "https://policyengine.github.io/us-marriage-incentive/",
"applicationCategory": "FinanceApplication",
"operatingSystem": "Web",
"offers": { "@type": "Offer", "price": "0", "priceCurrency": "USD" },
"author": {
"@type": "Organization",
"name": "PolicyEngine",
"url": "https://policyengine.org"
}
}
</script>
<!-- Theme color for mobile browsers — use --pe-color-primary-500 value -->
<meta name="theme-color" content="#319795">
Title Tag Rules
- Under 60 characters (Google truncates after that)
- Primary keyword first: "US Marriage Tax Calculator" not "PolicyEngine — Calculator"
- Include the brand at the end: "... | PolicyEngine"
- Be specific: "Marriage Tax Calculator" not "Calculator"
- Every page needs a unique title
Meta Description Rules
- 150-160 characters
- Include a call to action ("Calculate", "Find out", "Compare")
- Include primary keywords naturally
- Describe what the user gets, not what the app is
OG Image Rules
- Dimensions: 1200 x 630 pixels
- Format: PNG or JPG
- Must be an absolute URL (not a relative path)
- Should visually represent the app (screenshot, branded graphic)
- Place in
public/directory so it's available at build output root
Principle 5: Crawlability Files
robots.txt
Place in public/robots.txt (Vite copies public/ contents to build root):
User-agent: *
Allow: /
Sitemap: https://YOUR_DOMAIN/sitemap.xml
sitemap.xml
Place in public/sitemap.xml:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://YOUR_DOMAIN/</loc>
<lastmod>2025-01-01</lastmod>
<changefreq>monthly</changefreq>
</url>
</urlset>
For apps with multiple distinct pages, add each URL as a separate <url> entry.
.nojekyll (GitHub Pages only)
Always add an empty .nojekyll file to public/ when deploying to GitHub Pages. Without it, GitHub runs Jekyll processing which can mangle XML files like sitemap.xml and robots.txt, preventing Google from reading them.
GitHub Pages Sitemap Limitation
Known issue: Google Search Console cannot fetch sitemaps from .github.io domains. Even with a valid, accessible sitemap.xml, Search Console will show "Sitemap could not be read." This is a GitHub infrastructure limitation — GitHub blocks automated Googlebot fetches.
Workarounds:
- Custom domain (recommended): Set up a CNAME (e.g.,
tool.policyengine.org) pointing toorg.github.io. Sitemaps work correctly on custom domains. - URL Inspection: Manually request indexing via Search Console's URL Inspection tool — this works even when sitemap fetching doesn't.
- Backlinks: Links from other indexed sites (e.g., policyengine.org embedding/linking to the tool) will cause Google to discover and index the page without needing the sitemap.
Google Search Console
After deploying robots.txt and sitemap.xml:
- Go to https://search.google.com/search-console
- Add your domain as a property (use URL prefix for GitHub Pages subpaths)
- Verify ownership (HTML meta tag is easiest for GitHub Pages)
- Submit your sitemap URL
- If on
.github.io: expect "Sitemap could not be read" — use URL Inspection instead - Monitor indexing status and search performance
Principle 6: Page Speed is a Ranking Factor
Google measures Core Web Vitals:
| Metric | What | Target | How to test |
|---|---|---|---|
| LCP (Largest Contentful Paint) | Time to render biggest visible element | < 2.5s | PageSpeed Insights |
| FID (First Input Delay) | Time until page responds to first interaction | < 100ms | PageSpeed Insights |
| CLS (Cumulative Layout Shift) | Visual stability (how much things jump around) | < 0.1 | PageSpeed Insights |
Common performance issues in PolicyEngine apps
| Issue | Impact | Fix |
|---|---|---|
| Plotly.js bundle (~3-5 MB) | Destroys LCP | Replace with Recharts (~120 KB) or lazy-load aggressively |
| No code splitting | Entire app loads before anything renders | Use React.lazy() + Suspense |
| Unoptimized images | Slow LCP | Use WebP, proper sizing, lazy loading |
| No font preloading | Layout shift when fonts load | Use <link rel="preconnect"> + display=swap |
| Large JSON data files | Blocks initial render | Lazy-load data or move to API calls |
Test: Run PageSpeed Insights at https://pagespeed.web.dev/ with your deployed URL.
Principle 7: Semantic HTML Structure
Search engines use heading hierarchy to understand page structure.
Heading Rules
- One H1 per page — the primary topic/title
- H2 for major sections — visible subsections of the page
- H3 for subsections within H2 — never skip levels (H1 -> H3 is wrong)
- Headings should contain keywords naturally
- Don't use headings for styling — use CSS instead
Semantic Elements
Prefer semantic HTML over generic divs:
<main> <!-- Primary content -->
<nav> <!-- Navigation -->
<section> <!-- Thematic grouping -->
<article> <!-- Self-contained content -->
<aside> <!-- Sidebar/supplementary -->
<footer> <!-- Footer content -->
Principle 8: Accessibility Helps SEO
Google uses accessibility signals as ranking factors. Key items:
- Images need
alttext — describes the image for screen readers and Google - Form inputs need labels —
<label>elements oraria-label - Interactive elements need focus states — keyboard navigation support
- Color contrast — text must be readable (WCAG AA: 4.5:1 ratio)
- ARIA attributes — decorative elements get
aria-hidden="true"
Principle 9: Analytics and Measurement
You cannot improve what you cannot measure.
Required
- Google Search Console — Shows search queries, click-through rates, indexing errors. Free, essential.
- Google Analytics (GA4) — Shows traffic sources, user behavior, conversions. Free.
Integration
Add to index.html before </head>:
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-XXXXXXXXXX"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-XXXXXXXXXX');
</script>
Replace G-XXXXXXXXXX with the actual GA4 measurement ID.
Principle 10: Off-Page SEO (Backlinks)
The most powerful ranking signal is other websites linking to yours. For PolicyEngine:
- Blog posts on policyengine.org that link to the calculator
- Research papers that reference the tool
- Social media shares (indirect — drives traffic, which signals value)
- Being embedded on policyengine.org research pages (the parent page links/iframes the app)
This is not something the plugin can check, but it's important context: the policyengine.org embedding strategy provides backlink authority that standalone GitHub Pages deployments lack.
Quick Reference: SEO Audit Checklist
Critical (app won't rank without these)
-
<title>is descriptive, < 60 chars, includes keywords -
<meta name="description">is 150-160 chars with call to action -
<link rel="canonical">points to preferred URL -
og:title,og:description,og:image,og:urlpresent -
twitter:card,twitter:title,twitter:description,twitter:imagepresent -
robots.txtexists in build output root -
sitemap.xmlexists in build output root -
.nojekyllexists inpublic/(GitHub Pages only — prevents XML mangling) - Page content is in the HTML (not only JS-rendered)
-
<html lang="en">attribute set
Important (significantly helps ranking)
- JSON-LD structured data present
- One H1 per page, proper heading hierarchy (H1 > H2 > H3)
- Core Web Vitals passing (LCP < 2.5s, FID < 100ms, CLS < 0.1)
- Total JS bundle < 500 KB (excluding lazy-loaded chunks)
- Google Analytics installed
- Custom 404 page exists
-
<meta name="theme-color">set
Nice to Have
- PWA manifest.json
- Social sharing buttons
- Topic-specific landing pages with unique URLs
- FAQ schema markup
Related Skills
- policyengine-recharts-skill — Recharts reduces bundle size by ~3 MB vs Plotly (major SEO perf win)
- policyengine-design-skill — PolicyEngine branding for OG images and visual identity
Didn't find tool you were looking for?