Agent skill

proof-of-work

Proof artifact generation patterns for task validation. Covers screenshots, test results, deployments, and confidence scoring.

View SKILL.md on GitHub Repository

Stars 248

Forks 27

Install this agent skill to your Project

npx add-skill https://github.com/MadAppGang/claude-code/tree/main/plugins/autopilot/skills/proof-of-work

SKILL.md

plugin: autopilot updated: 2026-01-20

Proof-of-Work

Version: 0.1.0 Purpose: Generate validation artifacts for autonomous task completion Status: Phase 1

When to Use

Use this skill when you need to:

Generate proof artifacts after task completion
Capture screenshots for UI verification
Parse and report test results
Calculate confidence scores for task validation
Determine if a task can be auto-approved

Overview

Proof-of-work is the mechanism that validates task completion. Every finished task must include verifiable artifacts that demonstrate the work was done correctly.

Proof Types by Task

Bug Fix Proof

Artifact	Required	Purpose
Git diff	Yes	Show minimal, focused changes
Test results	Yes	All tests passing
Regression test	Yes	Specific test for the bug
Error log (before/after)	Optional	Visual evidence

Feature Proof

Artifact	Required	Purpose
Screenshots	Yes	Visual verification
Test results	Yes	Functionality works
Coverage report	Yes	>= 80% coverage
Build output	Yes	Builds successfully
Deployment URL	Optional	Live demo

UI Change Proof

Artifact	Required	Purpose
Desktop screenshot	Yes	1920x1080 view
Mobile screenshot	Yes	375x667 view
Tablet screenshot	Yes	768x1024 view
Accessibility score	Yes	>= 80 Lighthouse
Visual regression	Optional	BackstopJS diff

Screenshot Capture

Playwright Pattern:

typescript

import { chromium } from 'playwright';

async function captureScreenshots(url: string, outputDir: string) {
  const browser = await chromium.launch({ headless: true });
  const context = await browser.newContext();
  const page = await context.newPage();

  // Desktop
  await page.setViewportSize({ width: 1920, height: 1080 });
  await page.goto(url);
  await page.waitForLoadState('networkidle');
  await page.screenshot({
    path: `${outputDir}/desktop.png`,
    fullPage: true,
  });

  // Mobile
  await page.setViewportSize({ width: 375, height: 667 });
  await page.goto(url);
  await page.waitForLoadState('networkidle');
  await page.screenshot({
    path: `${outputDir}/mobile.png`,
    fullPage: true,
  });

  // Tablet
  await page.setViewportSize({ width: 768, height: 1024 });
  await page.goto(url);
  await page.waitForLoadState('networkidle');
  await page.screenshot({
    path: `${outputDir}/tablet.png`,
    fullPage: true,
  });

  await browser.close();
}

Confidence Scoring

Algorithm:

typescript

interface ProofArtifacts {
  testResults?: { passed: number; total: number };
  buildSuccessful?: boolean;
  lintErrors?: number;
  screenshots?: string[];
  testCoverage?: number;
  performanceScore?: number;
}

function calculateConfidence(artifacts: ProofArtifacts): number {
  let score = 0;

  // Tests (40 points)
  if (artifacts.testResults) {
    if (artifacts.testResults.passed === artifacts.testResults.total) {
      score += 40;
    }
  }

  // Build (20 points)
  if (artifacts.buildSuccessful) {
    score += 20;
  }

  // Coverage (20 points)
  if (artifacts.testCoverage) {
    if (artifacts.testCoverage >= 80) score += 20;
    else if (artifacts.testCoverage >= 60) score += 15;
    else if (artifacts.testCoverage >= 40) score += 10;
    else score += 5;
  }

  // Screenshots (10 points)
  if (artifacts.screenshots) {
    if (artifacts.screenshots.length >= 3) score += 10;
    else if (artifacts.screenshots.length >= 1) score += 5;
  }

  // Lint (10 points)
  if (artifacts.lintErrors === 0) {
    score += 10;
  }

  return score;
}

Confidence Thresholds

Confidence	Action
>= 95%	Auto-approve (In Review -> Done)
80-94%	Manual review required
< 80%	Validation failed, iterate

Proof Summary Template

markdown

# Proof of Work

**Task**: {issue_id}
**Type**: {task_type}
**Confidence**: {score}%

## Test Results
- Total: {total}
- Passed: {passed}
- Failed: {failed}
- Coverage: {coverage}%

## Build
- Status: {status}
- Duration: {duration}

## Screenshots
- Desktop: proof/desktop.png
- Mobile: proof/mobile.png
- Tablet: proof/tablet.png

## Artifacts
- test-results.txt
- coverage.json
- build-output.txt

Examples

Example 1: Feature Proof Generation

typescript

const proof = {
  testResults: { passed: 15, total: 15 },
  buildSuccessful: true,
  lintErrors: 0,
  screenshots: ['desktop.png', 'mobile.png', 'tablet.png'],
  testCoverage: 85,
};

const confidence = calculateConfidence(proof);
// 40 (tests) + 20 (build) + 20 (coverage) + 10 (screenshots) + 10 (lint) = 100%

Example 2: Partial Proof

typescript

const proof = {
  testResults: { passed: 12, total: 15 },  // Some failing
  buildSuccessful: true,
  lintErrors: 2,
  screenshots: ['desktop.png'],
  testCoverage: 65,
};

const confidence = calculateConfidence(proof);
// 0 (tests fail) + 20 (build) + 15 (coverage) + 5 (1 screenshot) + 0 (lint errors) = 40%
// Result: Validation failed, must iterate

Best Practices

Always capture screenshots for UI work
Run full test suite, not just affected tests
Include coverage report for features
Build must pass before any proof is valid
Store proofs in session directory for debugging
Generate proof summary in markdown for Linear comments

Maintainer

MadAppGang Core maintainer

Source details

Full Name: MadAppGang/claude-code
Branch: main
Path in repo: plugins/autopilot/skills/proof-of-work
License: MIT License

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

MadAppGang/claude-code

test-skill

A test skill for validation testing. Use when testing skill parsing and validation logic.

248 27

Explore

MadAppGang/claude-code

bad-skill

248 27

Explore

MadAppGang/claude-code

claudish-usage

CRITICAL - Guide for using Claudish CLI ONLY through sub-agents to run Claude Code with OpenRouter models (Grok, GPT-5, Gemini, MiniMax). NEVER run Claudish directly in main context unless user explicitly requests it. Use when user mentions external AI models, Claudish, OpenRouter, or alternative models. Includes mandatory sub-agent delegation patterns, agent selection guide, file-based instructions, and strict rules to prevent context window pollution.

248 27

Explore

MadAppGang/claude-code

release

Plugin release process for MAG Claude Plugins marketplace. Covers version bumping, marketplace.json updates, git tagging, and common mistakes. Use when releasing new plugin versions or troubleshooting update issues.

248 27

Explore

MadAppGang/claude-code

claudish-integration

248 27

Explore

MadAppGang/claude-code

openrouter-trending-models

Fetch trending programming models from OpenRouter rankings. Use when selecting models for multi-model review, updating model recommendations, or researching current AI coding trends. Provides model IDs, context windows, pricing, and usage statistics from the most recent week.

248 27

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Proof-of-Work

When to Use

Overview

Proof Types by Task

Bug Fix Proof

Feature Proof

UI Change Proof

Screenshot Capture

Confidence Scoring

Confidence Thresholds

Proof Summary Template

Examples

Example 1: Feature Proof Generation

Example 2: Partial Proof

Best Practices

Recommended Agent Skills

test-skill

bad-skill

claudish-usage

release

claudish-integration

openrouter-trending-models