Agent skill

seo-checklist

SEO first principles for PolicyEngine web applications - meta tags, crawlability, performance, and dual-mode (standalone + iframe) considerations

Stars 26
Forks 5

Install this agent skill to your Project

npx add-skill https://github.com/PolicyEngine/policyengine-claude/tree/main/skills/technical-patterns/seo-checklist-skill

SKILL.md

SEO First Principles for PolicyEngine Web Apps

Use this skill when auditing or building web applications that need to be discoverable via search engines. PolicyEngine apps are typically React SPAs deployed to GitHub Pages, often served both standalone and embedded as iframes in policyengine.org research pages.


How Search Engines Work

Google does three things:

  1. Crawl — Googlebot fetches your URL and downloads the raw HTML response
  2. Index — It reads that HTML, understands what the page is about, stores it
  3. Rank — When someone searches, it picks the best matching pages from its index

Your job is to make all three steps easy. If any step fails, your page won't appear in search results.


Principle 1: Google Reads HTML, Not Your Screen

The most critical issue for React SPAs. When Googlebot visits a client-side rendered app, it sees:

html
<div id="root"></div>

All content generated by JavaScript may or may not be indexed. Google can execute JS but:

  • Pages enter a "render queue" (delays of hours to days)
  • JS errors = no content indexed
  • Google deprioritizes JS-rendered content vs static HTML

Test: Run curl -s YOUR_URL | grep -c '<h1>' — if the result is 0, Google likely can't see your content.

Solutions (ranked by effectiveness):

Approach Description Effort
SSR (Next.js, Remix) Server renders full HTML on each request High (framework change)
SSG (Static Site Generation) Pre-build HTML at deploy time Medium
Pre-rendering Render SPA to static HTML for crawlers Low-Medium
Meta tags only At minimum, add static meta tags to index.html Low

For PolicyEngine calculator apps, pre-rendering or SSG is the sweet spot. The form/landing page is static content; only results are dynamic.


Principle 2: One URL = One Page = One Topic

Google ranks pages, not websites. Each URL you want to rank for needs:

  • Its own distinct URL path
  • Unique title and description
  • Content relevant to that specific topic

Hash Routing is Invisible to SEO

https://example.com/#country=us&region=CA&head=45000

Google treats everything after # as the same page. All hash variations = one URL = one indexed page.

Path-Based URLs Are Crawlable

https://example.com/us/california?head=45000

Query parameters (?key=value) ARE seen by Google (though they may be treated as variants). Path segments (/us/california) are treated as distinct pages.


Principle 3: The Standalone vs Iframe Dual-Mode Problem

PolicyEngine apps often run in two modes:

  1. Standalone — Deployed on GitHub Pages (e.g., policyengine.github.io/us-marriage-incentive/)
  2. Embedded — Iframed inside policyengine.org research pages (e.g., policyengine.org/us/research/marriage)

SEO Implications

Concern Standalone Embedded (iframe)
Indexed by Google? Yes (if crawlable) No — Google indexes the parent page, not iframe content
Needs meta tags? Yes — this is the version Google sees No — parent page provides meta tags
Needs canonical URL? Yes — should point to itself OR the parent page N/A
Needs robots.txt? Yes N/A (inherits from parent domain)
Needs sitemap? Yes N/A (parent sitemap covers parent pages)

Canonical URL Strategy

If the primary audience should find the app via policyengine.org:

html
<link rel="canonical" href="https://policyengine.org/us/research/marriage">

This tells Google: "The real version of this content lives on policyengine.org. Index that one."

If the standalone version is the primary:

html
<link rel="canonical" href="https://policyengine.github.io/us-marriage-incentive/">

Rule: Every page needs exactly one canonical URL. Without it, Google may index both versions and split your ranking power between them (called "duplicate content dilution").

Detecting Iframe Mode in Code

Most PolicyEngine apps already detect this:

javascript
const isEmbedded = window.self !== window.top;

SEO-relevant behavior should NOT depend on this check — meta tags, titles, and structured data must be present in the static HTML regardless of runtime mode.


Principle 4: Required Meta Tags

Every PolicyEngine web app needs these in index.html:

Critical (must have)

html
<!-- Basic SEO -->
<title>US Marriage Tax Calculator — Marriage Penalty & Bonus | PolicyEngine</title>
<meta name="description" content="Calculate how marriage affects your taxes and government benefits. See your marriage penalty or bonus across income levels for any US state.">
<link rel="canonical" href="https://policyengine.github.io/us-marriage-incentive/">

<!-- Open Graph (Facebook, LinkedIn, Slack, iMessage previews) -->
<meta property="og:type" content="website">
<meta property="og:title" content="US Marriage Tax Calculator">
<meta property="og:description" content="Calculate how marriage affects your taxes and government benefits.">
<meta property="og:image" content="https://policyengine.github.io/us-marriage-incentive/og-image.png">
<meta property="og:url" content="https://policyengine.github.io/us-marriage-incentive/">
<meta property="og:site_name" content="PolicyEngine">

<!-- Twitter / X -->
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="US Marriage Tax Calculator">
<meta name="twitter:description" content="Calculate how marriage affects your taxes and government benefits.">
<meta name="twitter:image" content="https://policyengine.github.io/us-marriage-incentive/og-image.png">

Important (should have)

html
<!-- Structured Data (JSON-LD) -->
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "WebApplication",
  "name": "US Marriage Tax Calculator",
  "description": "Calculate how marriage affects your taxes and government benefits.",
  "url": "https://policyengine.github.io/us-marriage-incentive/",
  "applicationCategory": "FinanceApplication",
  "operatingSystem": "Web",
  "offers": { "@type": "Offer", "price": "0", "priceCurrency": "USD" },
  "author": {
    "@type": "Organization",
    "name": "PolicyEngine",
    "url": "https://policyengine.org"
  }
}
</script>

<!-- Theme color for mobile browsers — use --pe-color-primary-500 value -->
<meta name="theme-color" content="#319795">

Title Tag Rules

  • Under 60 characters (Google truncates after that)
  • Primary keyword first: "US Marriage Tax Calculator" not "PolicyEngine — Calculator"
  • Include the brand at the end: "... | PolicyEngine"
  • Be specific: "Marriage Tax Calculator" not "Calculator"
  • Every page needs a unique title

Meta Description Rules

  • 150-160 characters
  • Include a call to action ("Calculate", "Find out", "Compare")
  • Include primary keywords naturally
  • Describe what the user gets, not what the app is

OG Image Rules

  • Dimensions: 1200 x 630 pixels
  • Format: PNG or JPG
  • Must be an absolute URL (not a relative path)
  • Should visually represent the app (screenshot, branded graphic)
  • Place in public/ directory so it's available at build output root

Principle 5: Crawlability Files

robots.txt

Place in public/robots.txt (Vite copies public/ contents to build root):

User-agent: *
Allow: /

Sitemap: https://YOUR_DOMAIN/sitemap.xml

sitemap.xml

Place in public/sitemap.xml:

xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://YOUR_DOMAIN/</loc>
    <lastmod>2025-01-01</lastmod>
    <changefreq>monthly</changefreq>
  </url>
</urlset>

For apps with multiple distinct pages, add each URL as a separate <url> entry.

.nojekyll (GitHub Pages only)

Always add an empty .nojekyll file to public/ when deploying to GitHub Pages. Without it, GitHub runs Jekyll processing which can mangle XML files like sitemap.xml and robots.txt, preventing Google from reading them.

GitHub Pages Sitemap Limitation

Known issue: Google Search Console cannot fetch sitemaps from .github.io domains. Even with a valid, accessible sitemap.xml, Search Console will show "Sitemap could not be read." This is a GitHub infrastructure limitation — GitHub blocks automated Googlebot fetches.

Workarounds:

  • Custom domain (recommended): Set up a CNAME (e.g., tool.policyengine.org) pointing to org.github.io. Sitemaps work correctly on custom domains.
  • URL Inspection: Manually request indexing via Search Console's URL Inspection tool — this works even when sitemap fetching doesn't.
  • Backlinks: Links from other indexed sites (e.g., policyengine.org embedding/linking to the tool) will cause Google to discover and index the page without needing the sitemap.

Google Search Console

After deploying robots.txt and sitemap.xml:

  1. Go to https://search.google.com/search-console
  2. Add your domain as a property (use URL prefix for GitHub Pages subpaths)
  3. Verify ownership (HTML meta tag is easiest for GitHub Pages)
  4. Submit your sitemap URL
  5. If on .github.io: expect "Sitemap could not be read" — use URL Inspection instead
  6. Monitor indexing status and search performance

Principle 6: Page Speed is a Ranking Factor

Google measures Core Web Vitals:

Metric What Target How to test
LCP (Largest Contentful Paint) Time to render biggest visible element < 2.5s PageSpeed Insights
FID (First Input Delay) Time until page responds to first interaction < 100ms PageSpeed Insights
CLS (Cumulative Layout Shift) Visual stability (how much things jump around) < 0.1 PageSpeed Insights

Common performance issues in PolicyEngine apps

Issue Impact Fix
Plotly.js bundle (~3-5 MB) Destroys LCP Replace with Recharts (~120 KB) or lazy-load aggressively
No code splitting Entire app loads before anything renders Use React.lazy() + Suspense
Unoptimized images Slow LCP Use WebP, proper sizing, lazy loading
No font preloading Layout shift when fonts load Use <link rel="preconnect"> + display=swap
Large JSON data files Blocks initial render Lazy-load data or move to API calls

Test: Run PageSpeed Insights at https://pagespeed.web.dev/ with your deployed URL.


Principle 7: Semantic HTML Structure

Search engines use heading hierarchy to understand page structure.

Heading Rules

  • One H1 per page — the primary topic/title
  • H2 for major sections — visible subsections of the page
  • H3 for subsections within H2 — never skip levels (H1 -> H3 is wrong)
  • Headings should contain keywords naturally
  • Don't use headings for styling — use CSS instead

Semantic Elements

Prefer semantic HTML over generic divs:

html
<main>          <!-- Primary content -->
<nav>           <!-- Navigation -->
<section>       <!-- Thematic grouping -->
<article>       <!-- Self-contained content -->
<aside>         <!-- Sidebar/supplementary -->
<footer>        <!-- Footer content -->

Principle 8: Accessibility Helps SEO

Google uses accessibility signals as ranking factors. Key items:

  • Images need alt text — describes the image for screen readers and Google
  • Form inputs need labels<label> elements or aria-label
  • Interactive elements need focus states — keyboard navigation support
  • Color contrast — text must be readable (WCAG AA: 4.5:1 ratio)
  • ARIA attributes — decorative elements get aria-hidden="true"

Principle 9: Analytics and Measurement

You cannot improve what you cannot measure.

Required

  • Google Search Console — Shows search queries, click-through rates, indexing errors. Free, essential.
  • Google Analytics (GA4) — Shows traffic sources, user behavior, conversions. Free.

Integration

Add to index.html before </head>:

html
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-XXXXXXXXXX"></script>
<script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());
  gtag('config', 'G-XXXXXXXXXX');
</script>

Replace G-XXXXXXXXXX with the actual GA4 measurement ID.


Principle 10: Off-Page SEO (Backlinks)

The most powerful ranking signal is other websites linking to yours. For PolicyEngine:

  • Blog posts on policyengine.org that link to the calculator
  • Research papers that reference the tool
  • Social media shares (indirect — drives traffic, which signals value)
  • Being embedded on policyengine.org research pages (the parent page links/iframes the app)

This is not something the plugin can check, but it's important context: the policyengine.org embedding strategy provides backlink authority that standalone GitHub Pages deployments lack.


Quick Reference: SEO Audit Checklist

Critical (app won't rank without these)

  • <title> is descriptive, < 60 chars, includes keywords
  • <meta name="description"> is 150-160 chars with call to action
  • <link rel="canonical"> points to preferred URL
  • og:title, og:description, og:image, og:url present
  • twitter:card, twitter:title, twitter:description, twitter:image present
  • robots.txt exists in build output root
  • sitemap.xml exists in build output root
  • .nojekyll exists in public/ (GitHub Pages only — prevents XML mangling)
  • Page content is in the HTML (not only JS-rendered)
  • <html lang="en"> attribute set

Important (significantly helps ranking)

  • JSON-LD structured data present
  • One H1 per page, proper heading hierarchy (H1 > H2 > H3)
  • Core Web Vitals passing (LCP < 2.5s, FID < 100ms, CLS < 0.1)
  • Total JS bundle < 500 KB (excluding lazy-loaded chunks)
  • Google Analytics installed
  • Custom 404 page exists
  • <meta name="theme-color"> set

Nice to Have

  • PWA manifest.json
  • Social sharing buttons
  • Topic-specific landing pages with unique URLs
  • FAQ schema markup

Related Skills

  • policyengine-recharts-skill — Recharts reduces bundle size by ~3 MB vs Plotly (major SEO perf win)
  • policyengine-design-skill — PolicyEngine branding for OG images and visual identity

Didn't find tool you were looking for?

Be as detailed as possible for better results