SEO First Principles for PolicyEngine Web Apps

Use this skill when auditing or building web applications that need to be discoverable via search engines. PolicyEngine apps are typically React SPAs deployed to GitHub Pages, often served both standalone and embedded as iframes in policyengine.org research pages.

How Search Engines Work

Google does three things:

Crawl — Googlebot fetches your URL and downloads the raw HTML response
Index — It reads that HTML, understands what the page is about, stores it
Rank — When someone searches, it picks the best matching pages from its index

Your job is to make all three steps easy. If any step fails, your page won't appear in search results.

Principle 1: Google Reads HTML, Not Your Screen

The most critical issue for React SPAs. When Googlebot visits a client-side rendered app, it sees:

html

<div id="root"></div>

All content generated by JavaScript may or may not be indexed. Google can execute JS but:

Pages enter a "render queue" (delays of hours to days)
JS errors = no content indexed
Google deprioritizes JS-rendered content vs static HTML

Test: Run curl -s YOUR_URL | grep -c '<h1>' — if the result is 0, Google likely can't see your content.

Solutions (ranked by effectiveness):

Approach	Description	Effort
SSR (Next.js, Remix)	Server renders full HTML on each request	High (framework change)
SSG (Static Site Generation)	Pre-build HTML at deploy time	Medium
Pre-rendering	Render SPA to static HTML for crawlers	Low-Medium
Meta tags only	At minimum, add static meta tags to index.html	Low

For PolicyEngine calculator apps, pre-rendering or SSG is the sweet spot. The form/landing page is static content; only results are dynamic.

Principle 2: One URL = One Page = One Topic

Google ranks pages, not websites. Each URL you want to rank for needs:

Its own distinct URL path
Unique title and description
Content relevant to that specific topic

Hash Routing is Invisible to SEO

https://example.com/#country=us&region=CA&head=45000

Google treats everything after # as the same page. All hash variations = one URL = one indexed page.

Path-Based URLs Are Crawlable

https://example.com/us/california?head=45000

Query parameters (?key=value) ARE seen by Google (though they may be treated as variants). Path segments (/us/california) are treated as distinct pages.

Principle 3: The Standalone vs Iframe Dual-Mode Problem

PolicyEngine apps often run in two modes:

Standalone — Deployed on GitHub Pages (e.g., policyengine.github.io/us-marriage-incentive/)
Embedded — Iframed inside policyengine.org research pages (e.g., policyengine.org/us/research/marriage)

SEO Implications

Concern	Standalone	Embedded (iframe)
Indexed by Google?	Yes (if crawlable)	No — Google indexes the parent page, not iframe content
Needs meta tags?	Yes — this is the version Google sees	No — parent page provides meta tags
Needs canonical URL?	Yes — should point to itself OR the parent page	N/A
Needs robots.txt?	Yes	N/A (inherits from parent domain)
Needs sitemap?	Yes	N/A (parent sitemap covers parent pages)

Canonical URL Strategy

If the primary audience should find the app via policyengine.org:

html

<link rel="canonical" href="https://policyengine.org/us/research/marriage">

This tells Google: "The real version of this content lives on policyengine.org. Index that one."

If the standalone version is the primary:

html

<link rel="canonical" href="https://policyengine.github.io/us-marriage-incentive/">

Rule: Every page needs exactly one canonical URL. Without it, Google may index both versions and split your ranking power between them (called "duplicate content dilution").

Detecting Iframe Mode in Code

Most PolicyEngine apps already detect this:

javascript

const isEmbedded = window.self !== window.top;

SEO-relevant behavior should NOT depend on this check — meta tags, titles, and structured data must be present in the static HTML regardless of runtime mode.

Principle 4: Required Meta Tags

Every PolicyEngine web app needs these in index.html:

Critical (must have)

html

<!-- Basic SEO -->
<title>US Marriage Tax Calculator — Marriage Penalty & Bonus | PolicyEngine</title>
<meta name="description" content="Calculate how marriage affects your taxes and government benefits. See your marriage penalty or bonus across income levels for any US state.">
<link rel="canonical" href="https://policyengine.github.io/us-marriage-incentive/">

<!-- Open Graph (Facebook, LinkedIn, Slack, iMessage previews) -->
<meta property="og:type" content="website">
<meta property="og:title" content="US Marriage Tax Calculator">
<meta property="og:description" content="Calculate how marriage affects your taxes and government benefits.">
<meta property="og:image" content="https://policyengine.github.io/us-marriage-incentive/og-image.png">
<meta property="og:url" content="https://policyengine.github.io/us-marriage-incentive/">
<meta property="og:site_name" content="PolicyEngine">

<!-- Twitter / X -->
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="US Marriage Tax Calculator">
<meta name="twitter:description" content="Calculate how marriage affects your taxes and government benefits.">
<meta name="twitter:image" content="https://policyengine.github.io/us-marriage-incentive/og-image.png">

Important (should have)

html

<!-- Structured Data (JSON-LD) -->
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "WebApplication",
  "name": "US Marriage Tax Calculator",
  "description": "Calculate how marriage affects your taxes and government benefits.",
  "url": "https://policyengine.github.io/us-marriage-incentive/",
  "applicationCategory": "FinanceApplication",
  "operatingSystem": "Web",
  "offers": { "@type": "Offer", "price": "0", "priceCurrency": "USD" },
  "author": {
    "@type": "Organization",
    "name": "PolicyEngine",
    "url": "https://policyengine.org"
  }
}
</script>

<!-- Theme color for mobile browsers — use --pe-color-primary-500 value -->
<meta name="theme-color" content="#319795">

Title Tag Rules

Under 60 characters (Google truncates after that)
Primary keyword first: "US Marriage Tax Calculator" not "PolicyEngine — Calculator"
Include the brand at the end: "... | PolicyEngine"
Be specific: "Marriage Tax Calculator" not "Calculator"
Every page needs a unique title

Meta Description Rules

150-160 characters
Include a call to action ("Calculate", "Find out", "Compare")
Include primary keywords naturally
Describe what the user gets, not what the app is

OG Image Rules

Dimensions: 1200 x 630 pixels
Format: PNG or JPG
Must be an absolute URL (not a relative path)
Should visually represent the app (screenshot, branded graphic)
Place in public/ directory so it's available at build output root

Principle 5: Crawlability Files

robots.txt

Place in public/robots.txt (Vite copies public/ contents to build root):

User-agent: *
Allow: /

Sitemap: https://YOUR_DOMAIN/sitemap.xml

sitemap.xml

Place in public/sitemap.xml:

xml

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://YOUR_DOMAIN/</loc>
    <lastmod>2025-01-01</lastmod>
    <changefreq>monthly</changefreq>
  </url>
</urlset>

For apps with multiple distinct pages, add each URL as a separate <url> entry.

.nojekyll (GitHub Pages only)

Always add an empty .nojekyll file to public/ when deploying to GitHub Pages. Without it, GitHub runs Jekyll processing which can mangle XML files like sitemap.xml and robots.txt, preventing Google from reading them.

GitHub Pages Sitemap Limitation

Known issue: Google Search Console cannot fetch sitemaps from .github.io domains. Even with a valid, accessible sitemap.xml, Search Console will show "Sitemap could not be read." This is a GitHub infrastructure limitation — GitHub blocks automated Googlebot fetches.

Workarounds:

Custom domain (recommended): Set up a CNAME (e.g., tool.policyengine.org) pointing to org.github.io. Sitemaps work correctly on custom domains.
URL Inspection: Manually request indexing via Search Console's URL Inspection tool — this works even when sitemap fetching doesn't.
Backlinks: Links from other indexed sites (e.g., policyengine.org embedding/linking to the tool) will cause Google to discover and index the page without needing the sitemap.

Google Search Console

After deploying robots.txt and sitemap.xml:

Go to https://search.google.com/search-console
Add your domain as a property (use URL prefix for GitHub Pages subpaths)
Verify ownership (HTML meta tag is easiest for GitHub Pages)
Submit your sitemap URL
If on .github.io: expect "Sitemap could not be read" — use URL Inspection instead
Monitor indexing status and search performance

Principle 6: Page Speed is a Ranking Factor

Google measures Core Web Vitals:

Metric	What	Target	How to test
LCP (Largest Contentful Paint)	Time to render biggest visible element	< 2.5s	PageSpeed Insights
FID (First Input Delay)	Time until page responds to first interaction	< 100ms	PageSpeed Insights
CLS (Cumulative Layout Shift)	Visual stability (how much things jump around)	< 0.1	PageSpeed Insights

Common performance issues in PolicyEngine apps

Issue	Impact	Fix
Plotly.js bundle (~3-5 MB)	Destroys LCP	Replace with Recharts (~120 KB) or lazy-load aggressively
No code splitting	Entire app loads before anything renders	Use React.lazy() + Suspense
Unoptimized images	Slow LCP	Use WebP, proper sizing, lazy loading
No font preloading	Layout shift when fonts load	Use `<link rel="preconnect">` + `display=swap`
Large JSON data files	Blocks initial render	Lazy-load data or move to API calls

Test: Run PageSpeed Insights at https://pagespeed.web.dev/ with your deployed URL.

Principle 7: Semantic HTML Structure

Search engines use heading hierarchy to understand page structure.

Heading Rules

One H1 per page — the primary topic/title
H2 for major sections — visible subsections of the page
H3 for subsections within H2 — never skip levels (H1 -> H3 is wrong)
Headings should contain keywords naturally
Don't use headings for styling — use CSS instead

Semantic Elements

Prefer semantic HTML over generic divs:

html

<main>          <!-- Primary content -->
<nav>           <!-- Navigation -->
<section>       <!-- Thematic grouping -->
<article>       <!-- Self-contained content -->
<aside>         <!-- Sidebar/supplementary -->
<footer>        <!-- Footer content -->

Principle 8: Accessibility Helps SEO

Google uses accessibility signals as ranking factors. Key items:

Images need alt text — describes the image for screen readers and Google
Form inputs need labels — <label> elements or aria-label
Interactive elements need focus states — keyboard navigation support
Color contrast — text must be readable (WCAG AA: 4.5:1 ratio)
ARIA attributes — decorative elements get aria-hidden="true"

Principle 9: Analytics and Measurement

You cannot improve what you cannot measure.

Required

Google Search Console — Shows search queries, click-through rates, indexing errors. Free, essential.
Google Analytics (GA4) — Shows traffic sources, user behavior, conversions. Free.

Integration

Add to index.html before </head>:

html

<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-XXXXXXXXXX"></script>
<script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());
  gtag('config', 'G-XXXXXXXXXX');
</script>

Replace G-XXXXXXXXXX with the actual GA4 measurement ID.

Principle 10: Off-Page SEO (Backlinks)

The most powerful ranking signal is other websites linking to yours. For PolicyEngine:

Blog posts on policyengine.org that link to the calculator
Research papers that reference the tool
Social media shares (indirect — drives traffic, which signals value)
Being embedded on policyengine.org research pages (the parent page links/iframes the app)

This is not something the plugin can check, but it's important context: the policyengine.org embedding strategy provides backlink authority that standalone GitHub Pages deployments lack.

Quick Reference: SEO Audit Checklist

Critical (app won't rank without these)

<title> is descriptive, < 60 chars, includes keywords
<meta name="description"> is 150-160 chars with call to action
<link rel="canonical"> points to preferred URL
og:title, og:description, og:image, og:url present
twitter:card, twitter:title, twitter:description, twitter:image present
robots.txt exists in build output root
sitemap.xml exists in build output root
.nojekyll exists in public/ (GitHub Pages only — prevents XML mangling)
Page content is in the HTML (not only JS-rendered)
<html lang="en"> attribute set

Important (significantly helps ranking)

JSON-LD structured data present
One H1 per page, proper heading hierarchy (H1 > H2 > H3)
Core Web Vitals passing (LCP < 2.5s, FID < 100ms, CLS < 0.1)
Total JS bundle < 500 KB (excluding lazy-loaded chunks)
Google Analytics installed
Custom 404 page exists
<meta name="theme-color"> set

Nice to Have

PWA manifest.json
Social sharing buttons
Topic-specific landing pages with unique URLs
FAQ schema markup

Related Skills

policyengine-recharts-skill — Recharts reduces bundle size by ~3 MB vs Plotly (major SEO perf win)
policyengine-design-skill — PolicyEngine branding for OG images and visual identity

Search AI Tools

Install this agent skill to your Project

SKILL.md

SEO First Principles for PolicyEngine Web Apps

How Search Engines Work

Principle 1: Google Reads HTML, Not Your Screen

Principle 2: One URL = One Page = One Topic

Hash Routing is Invisible to SEO

Path-Based URLs Are Crawlable

Principle 3: The Standalone vs Iframe Dual-Mode Problem

SEO Implications

Canonical URL Strategy

Detecting Iframe Mode in Code

Principle 4: Required Meta Tags

Critical (must have)

Important (should have)

Title Tag Rules

Meta Description Rules

OG Image Rules

Principle 5: Crawlability Files

robots.txt

sitemap.xml

.nojekyll (GitHub Pages only)

GitHub Pages Sitemap Limitation

Google Search Console

Principle 6: Page Speed is a Ranking Factor

Common performance issues in PolicyEngine apps

Principle 7: Semantic HTML Structure

Heading Rules

Semantic Elements

Principle 8: Accessibility Helps SEO

Principle 9: Analytics and Measurement

Required

Integration

Principle 10: Off-Page SEO (Backlinks)

Quick Reference: SEO Audit Checklist

Critical (app won't rank without these)

Important (significantly helps ranking)

Nice to Have

Related Skills