Agent skill

html-structure-validate

Validate HTML5 structure and basic syntax. BLOCKING quality gate - stops pipeline if validation fails. Ensures deterministic output quality.

Stars 232
Forks 15

Install this agent skill to your Project

npx add-skill https://github.com/aiskillstore/marketplace/tree/main/skills/abejitsu/html-structure-validate

SKILL.md

HTML Structure Validate Skill

Purpose

This skill is a BLOCKING quality gate that ensures generated HTML meets minimum structural requirements. It is the first deterministic validation of probabilistic AI-generated output.

The skill checks:

  • HTML5 compliance - Proper DOCTYPE, tags
  • Tag closure - All tags properly closed
  • Required elements - Meta tags, stylesheet links
  • Well-formedness - Valid structure

If validation fails, the pipeline STOPS and triggers a hook to notify the user.

This enforces the principle: Python validates, ensuring deterministic quality.

What to Do

  1. Load HTML file to validate

    • Read 04_page_XX.html generated by AI skill
    • Verify file exists and is readable
    • Confirm file is text (not binary)
  2. Run validation checks

    • Check HTML5 structure compliance
    • Verify tag closure
    • Validate head section
    • Check required CSS link
    • Validate page container structure
  3. Generate validation report

    • Document all checks performed
    • List any errors found
    • Note warnings (non-blocking)
    • Record informational findings
  4. Save validation report as JSON

    • Save to: output/chapter_XX/page_artifacts/page_YY/06_validation_structure.json
    • Include timestamp
    • Include all check results
  5. Exit with appropriate code

    • Return 0 if VALID (continue pipeline)
    • Return 1 if INVALID (STOP pipeline, trigger hook)

Input Parameters

html_file: <str>         - Path to 04_page_XX.html
output_dir: <str>        - Directory for validation report
strict_mode: <bool>      - If true, warnings also fail (default: false)
page_number: <int>       - Page number (for reporting)
chapter: <int>           - Chapter number (for reporting)

Validation Checks

Check 1: DOCTYPE Declaration

Requirement: File must start with proper DOCTYPE

html
<!DOCTYPE html>

Check:

  • File contains <!DOCTYPE html> (case-insensitive)
  • DOCTYPE appears before any tags
  • DOCTYPE is on first line or near beginning

Error if: Missing or incorrect DOCTYPE

Check 2: HTML Tags

Requirement: Proper <html> opening and closing tags

html
<html lang="en">
    ...
</html>

Checks:

  • <html> tag present
  • </html> closing tag present
  • Tags are properly paired
  • No unclosed <html> tags

Error if: Missing either tag or improperly paired

Check 3: Head Section

Requirement: Complete <head> section with metadata

html
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>...</title>
    <link rel="stylesheet" href="../../styles/main.css">
</head>

Checks:

  • <head> and </head> tags present
  • <meta charset="UTF-8"> present
  • <meta name="viewport"> present (warning if missing)
  • <title> tag with content present
  • CSS <link> tag present with href attribute

Error if: Missing charset, title, or CSS link Warning if: Missing viewport meta tag

Check 4: Body Section

Requirement: Proper <body> tags with content

html
<body>
    <div class="page-container">
        <main class="page-content">
            ...
        </main>
    </div>
</body>

Checks:

  • <body> and </body> tags present
  • <div class="page-container"> present
  • <main class="page-content"> present inside container
  • Body contains substantial content (> 100 bytes)

Error if: Missing tags or required container divs

Check 5: Tag Closure Validation

Requirement: All tags must be properly closed

Checks for:

  • Unmatched opening tags (e.g., <p> without </p>)
  • Improper nesting (e.g., <p><h2>text</h2></p>)
  • Self-closing tags used correctly (e.g., <br/>, <img/>)
  • Comment blocks properly formatted (<!-- -->)

Validation method:

  • Parse HTML into tree structure
  • Verify all nodes properly matched
  • Check nesting doesn't violate HTML5 rules

Error if: Any unmatched or improperly nested tags

Check 6: Heading Tags (h1-h6)

Requirement: Valid heading hierarchy

html
<h1>Chapter Title</h1>
<h2>Section Heading</h2>
<h3>Subsection</h3>

Checks:

  • All heading tags properly closed
  • First heading should be h1 (warning if not)
  • Heading levels don't skip dramatically (h1 → h4 is suspicious)
  • All headings have text content (not empty)

Error if: Heading tags improperly closed Warning if: Suspicious hierarchy

Check 7: Content Structure

Requirement: Meaningful content in page container

Checks:

  • <main class="page-content"> contains elements
  • Content includes headings or paragraphs
  • No completely empty content area
  • Text nodes or elements present (> 100 words total)

Error if: No content or empty structure

Check 8: List Integrity

Requirement: All lists properly structured

Checks for each <ul> or <ol>:

  • List opening and closing tags matched
  • List contains <li> elements
  • All <li> tags properly closed
  • <li> count matches opening/closing pairs
  • No nested <ul> or <ol> improperly closed

Error if: Empty lists or unmatched <li> tags

Check 9: Image and Link Tags

Requirement: Self-closing tags properly formatted

Checks:

  • All <img> tags have src and alt attributes
  • All <a> tags have valid href attributes
  • Image paths don't have obvious errors (no broken syntax)
  • Self-closing tags use proper syntax

Warning if: Images missing alt text or links missing href

Check 10: Table Tags (if present)

Requirement: Proper table structure

Checks:

  • <table>, <tr>, <td>, <th> tags properly nested
  • All rows have consistent column counts
  • Table headers and body properly structured

Error if: Malformed table structure

Validation Report Format

Output: 06_validation_structure.json

json
{
  "page": 16,
  "book_page": 17,
  "chapter": 2,
  "validation_type": "structure",
  "validation_timestamp": "2025-11-08T14:34:00Z",
  "overall_status": "PASS",
  "error_count": 0,
  "warning_count": 1,
  "checks_performed": [
    {
      "check_name": "DOCTYPE Declaration",
      "status": "PASS",
      "details": "Valid HTML5 DOCTYPE found"
    },
    {
      "check_name": "HTML Tags",
      "status": "PASS",
      "details": "Proper <html> opening and closing tags"
    },
    {
      "check_name": "Head Section",
      "status": "PASS",
      "details": "All required meta tags and title present"
    },
    {
      "check_name": "Body Section",
      "status": "PASS",
      "details": "Body and content structure valid"
    },
    {
      "check_name": "Tag Closure",
      "status": "PASS",
      "details": "All tags properly matched and closed"
    },
    {
      "check_name": "Heading Hierarchy",
      "status": "PASS",
      "details": "4 headings found, proper h1-h4 hierarchy"
    },
    {
      "check_name": "Content Structure",
      "status": "PASS",
      "details": "Main content area contains 245 words across 3 paragraphs"
    },
    {
      "check_name": "List Integrity",
      "status": "PASS",
      "details": "1 list with 3 items, all properly formed"
    },
    {
      "check_name": "Image Tags",
      "status": "PASS",
      "details": "No images on this page"
    },
    {
      "check_name": "Table Tags",
      "status": "PASS",
      "details": "No tables on this page"
    }
  ],
  "errors": [],
  "warnings": [
    {
      "check": "Heading Hierarchy",
      "message": "First heading is h2, typically should be h1 for page opening",
      "severity": "LOW"
    }
  ],
  "summary": {
    "total_checks": 10,
    "passed": 9,
    "failed": 0,
    "warnings": 1,
    "html_valid": true,
    "tags_matched": true,
    "content_substantial": true
  }
}

Validation Rules

PASS Criteria

  • DOCTYPE present and valid
  • All required tags (html, head, body, main, div.page-container) present
  • All tags properly closed and matched
  • Title tag with content
  • CSS stylesheet link present
  • Content structure valid
  • No structural errors

FAIL Criteria (BLOCKS PIPELINE)

  • Missing DOCTYPE
  • Missing required tags
  • Unmatched or improperly nested tags
  • Missing title or CSS link
  • Empty content
  • Malformed lists or tables

WARNING (Logged but doesn't block)

  • Missing viewport meta tag
  • First heading is not h1
  • Large heading jumps (h1 → h4)
  • Missing alt text on images
  • Missing href on links

Implementation: Using Python Script

This validation is performed by existing validate_html.py tool, run in structure validation mode:

bash
cd Calypso/tools

# Validate single page HTML
python3 validate_html.py \
  ../output/chapter_02/page_artifacts/page_16/04_page_16.html \
  --output-json ../output/chapter_02/page_artifacts/page_16/06_validation_structure.json \
  --strict-structure

# Exit code:
# 0 = VALID (continue to next skill)
# 1 = INVALID (STOP pipeline)

Hook Integration

When validation FAILS:

bash
# Trigger hook: .claude/hooks/validate-structure.sh
# Receives:
#   - Page number
#   - HTML file path
#   - Validation report path
#   - Error details

# Hook behavior:
# - Log failure with details
# - Save error report
# - Notify user
# - STOP pipeline (no further processing)

Error Recovery

If validation fails:

  1. User reviews validation report
  2. User identifies issue in AI-generated HTML
  3. Options:
    • Fix HTML manually and re-validate
    • Re-run AI generation with improved prompt
    • Review source extraction data for errors
    • Proceed with caution (expert override)

Quality Metrics

Validation provides metrics:

  • Percentage of checks passing
  • Error severity levels
  • Content size (word count, element count)
  • Structure complexity

These metrics feed into final quality reports.

Success Criteria

✓ Validation completes successfully ✓ All structural checks pass (0 errors) ✓ Validation report saved in JSON format ✓ Exit code 0 returned (or 1 if invalid) ✓ Clear error messages if validation fails

Next Steps After PASS

If validation passes:

  1. All pages of chapter processed through this gate
  2. Skill 4 (consolidate pages) merges individual page HTMLs
  3. Quality Gate 2 (semantic validate) checks semantic structure
  4. Continue through validation pipeline

Next Steps After FAIL

If validation fails:

  1. PIPELINE STOPS
  2. Hook validate-structure.sh triggered
  3. User receives error report with details
  4. User must fix issues and retry

Design Notes

  • This is the first deterministic quality gate
  • Uses proven validate_html.py tool
  • Catches structural issues before semantic analysis
  • Provides clear, actionable error messages
  • Essential for ensuring pipeline reliability

Testing

To test structure validation:

bash
# Test with known-good HTML
python3 validate_html.py ../output/chapter_01/chapter_01.html

# Should show: ✓ VALID

# Test with invalid HTML (if needed)
python3 validate_html.py broken_html.html

# Should show: ✗ INVALID with specific errors

Expand your agent's capabilities with these related and highly-rated skills.

aiskillstore/marketplace

perigon-backend

Perigon ASP.NET Core + EF Core + Aspire conventions

232 15
Explore
aiskillstore/marketplace

perigon-agent

Pointers for Copilot/agents to apply Perigon conventions

232 15
Explore
aiskillstore/marketplace

perigon-angular

Angular 21+ standalone/Material/signal conventions for Perigon WebApp

232 15
Explore
aiskillstore/marketplace

fastapi-mastery

Comprehensive FastAPI development skill covering REST API creation, routing, request/response handling, validation, authentication, database integration, middleware, and deployment. Use when working with FastAPI projects, building APIs, implementing CRUD operations, setting up authentication/authorization, integrating databases (SQL/NoSQL), adding middleware, handling WebSockets, or deploying FastAPI applications. Triggered by requests involving .py files with FastAPI code, API endpoint creation, Pydantic models, or FastAPI-specific features.

232 15
Explore
aiskillstore/marketplace

context7-efficient

Token-efficient library documentation fetcher using Context7 MCP with 86.8% token savings through intelligent shell pipeline filtering. Fetches code examples, API references, and best practices for JavaScript, Python, Go, Rust, and other libraries. Use when users ask about library documentation, need code examples, want API usage patterns, are learning a new framework, need syntax reference, or troubleshooting with library-specific information. Triggers include questions like "Show me React hooks", "How do I use Prisma", "What's the Next.js routing syntax", or any request for library/framework documentation.

232 15
Explore
aiskillstore/marketplace

browser-use

Browser automation using Playwright MCP. Navigate websites, fill forms, click elements, take screenshots, and extract data. Use when tasks require web browsing, form submission, web scraping, UI testing, or any browser interaction.

232 15
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results