Agent skill
source-integrity
Use when processing external sources - validates checksums, metadata completeness, and expiry dates for research sources
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/source-integrity
SKILL.md
Source Integrity
Purpose
Ensure all research sources and external inputs maintain:
- Accurate checksums (detect file modifications)
- Complete metadata (title, author, date, type)
- Valid expiry dates (prevent stale data usage)
- Consistent schema across all sources
When to Use This Skill
Activate automatically when:
- Processing external sources with
research-processingworkflow - Validating flashcard content hashes for Mochi sync
- Verifying meeting transcript integrity
- User explicitly requests source validation
- Any workflow depends on external data integrity
Integrity Requirements
1. Checksum Validation
Requirement: All sources must have content checksums for change detection
Supported algorithms:
- SHA-256 (preferred)
- MD5 (legacy support)
Validation process:
- Read source file content
- Calculate checksum of current content
- Compare to stored checksum in metadata
- Flag mismatch as modification
Pass:
checksum: "sha256:a7f3b2c1..."
Calculated checksum matches stored value.
Fail:
checksum: "sha256:a7f3b2c1..."
Calculated checksum: sha256:d4e5f6g7... (mismatch = file modified)
2. Metadata Completeness
Required fields for research sources:
title: "Source Title"
kind: "file" | "url" | "note"
path: "/absolute/path/to/source.md" # for kind=file
url: "https://..." # for kind=url
checksum: "sha256:..."
added_utc: "YYYY-MM-DDTHH:MM:SSZ"
topic: "strategic-category" # e.g., competitive-analysis, pricing-strategy
expiry_date: "YYYY-MM-DD" # optional, based on source type
Optional but recommended:
author: "Author Name"
published_date: "YYYY-MM-DD"
summary: "Brief description"
tags: ["keyword1", "keyword2"]
Validation process:
- Read source metadata (YAML frontmatter or citations/sources.json entry)
- Verify all required fields present
- Validate field formats (dates, URLs, paths)
- Flag missing or malformed fields
Pass: All required fields present and properly formatted.
Fail examples:
- Missing
checksumfield - Invalid date format (
added_utc: "2025-10-21"instead of ISO 8601) - Relative path instead of absolute (
path: "notes/file.md") - Unknown
kindvalue (kind: "pdf"when only file/url/note supported)
3. Expiry Management
Purpose: Prevent using stale data in strategic decisions
Expiry guidelines by source type:
| Source Type | Suggested Expiry | Rationale |
|---|---|---|
| Frameworks | 1-2 years | Conceptual models change slowly |
| Market data | 3-6 months | Markets evolve quickly |
| Competitor intel | 6-12 months | Product changes, pricing shifts |
| Customer quotes | 6-12 months | Needs/priorities evolve |
| Internal meeting notes | 12 months | Context becomes stale |
Validation process:
- Read source
expiry_datefield (if present) - Compare to current date
- If expired: Flag for review/refresh
- If no expiry_date: Suggest based on source type
Pass:
expiry_date: "2026-03-15"
Current date: 2025-10-21 (not expired)
Fail:
expiry_date: "2025-05-01"
Current date: 2025-10-21 (expired, requires refresh)
4. Schema Consistency
Requirement: All sources follow standardized schema
Research source schema (datasets/research/{topic}/{filename}.md):
---
title: "Source Title"
kind: "file" | "url" | "note"
topic: "strategic-category"
url: "https://..." (if applicable)
checksum: "sha256:..."
added_utc: "YYYY-MM-DDTHH:MM:SSZ"
expiry_date: "YYYY-MM-DD"
---
# Source Title
## Key Insights
- Insight 1
- Insight 2
## Strategic Applications
- How this informs decisions
- Relevant use cases
## Citations / Quotes
> "Verbatim quote for future citation"
> — Author, Publication (Date)
## Related Internal Links
- [Meeting notes](datasets/meetings/...)
- [Epic](datasets/product/epics/...)
Meeting schema (see meeting-schema-validation skill)
Citation schema (citations/sources.json):
{
"id": "src_abc123",
"title": "Source Title",
"kind": "file|url|note",
"path": "/absolute/path",
"url": "https://..." (optional),
"checksum": "sha256:...",
"added_utc": "2025-10-21T14:30:00Z"
}
Validation Process
1. Load Source
Read source from:
datasets/research/{topic}/{filename}.md, ORcitations/sources.jsonentry, ORdatasets/learning/cards/{topic}/{filename}.md(for flashcard validation)
2. Apply Integrity Checks
Run all checks in sequence:
Checksum:
# Calculate current checksum
sha256sum /path/to/source.md | awk '{print "sha256:"$1}'
# Compare to stored checksum in metadata
Metadata:
- Verify required fields present
- Validate date formats (ISO 8601)
- Validate paths (absolute, exists)
- Validate URLs (proper format)
Expiry:
- Check expiry_date against current date
- Flag expired sources
- Suggest expiry dates for sources without them
Schema:
- Verify YAML frontmatter format
- Check required sections exist (for markdown sources)
- Validate JSON structure (for citations/sources.json)
3. Generate Report
If all pass:
# Source Integrity Validation: PASS
**Source**: [filename or title]
✓ Checksum valid: sha256:a7f3b2c1... (matches stored value)
✓ Metadata complete: All required fields present
✓ Not expired: expiry_date 2026-03-15 (172 days remaining)
✓ Schema valid: Proper YAML frontmatter and sections
**Status**: Source integrity verified
If any fail:
# Source Integrity Validation: FAIL
**Source**: [filename or title]
✗ Checksum mismatch:
- Stored: sha256:a7f3b2c1...
- Calculated: sha256:d4e5f6g7...
- **Action**: File modified. Review changes and update checksum.
✗ Missing metadata fields:
- `added_utc` not present
- `topic` not present
- **Action**: Add required fields to frontmatter
✗ Expired source:
- expiry_date: 2025-05-01 (173 days ago)
- **Action**: Review and refresh source, or extend expiry if still valid
**Required fixes**:
1. Recalculate and update checksum
2. Add missing metadata fields
3. Refresh expired source or justify extension
**Status**: NEEDS_FIX
4. Block or Approve
If PASS:
- Source can be used in workflows
- Citations can reference this source
- Integrity confirmed
If FAIL:
- Source blocked from usage
- Workflows depending on this source flagged
- Must address violations before use
Integration with Workflows
Research Processing Integration
Invoked by:
research-processingworkflow (when adding new sources)source-normalizationskill (after creating citation entry)
Blocking behavior:
- If integrity check fails → source not added to citations/sources.json
- User notified of required fixes
Mochi Sync Integration
Invoked by:
mochi-syncworkflow (validates flashcard content hashes)
Behavior:
- Checksums determine if card content changed
- If checksum mismatch → card marked for update in Mochi
- If checksum matches → skip card (already synced)
Meeting Processing Integration
Optional usage:
- Can validate meeting transcript integrity
- Detect if transcript modified after initial processing
- Flag for re-processing if changed
Success Criteria
Source integrity validated when:
- Checksum matches current file content
- All required metadata fields present and valid
- Source not expired (or expiry justified)
- Schema matches expected format
- Validation report shows PASS status
Common Mistakes
| Mistake | Fix |
|---|---|
| Forgetting to update checksum after edits | Recalculate and update metadata |
| Using relative paths | Convert to absolute paths |
| Missing expiry_date on time-sensitive sources | Add expiry based on source type |
| Invalid date format | Use ISO 8601: YYYY-MM-DDTHH:MM:SSZ |
| Ignoring expired sources | Refresh or explicitly justify continued use |
Related Skills
- source-normalization: Creates normalized source entries (invokes this skill for validation)
- research-processing: Uses this skill to validate external sources
- mochi-sync: Uses checksum logic from this skill for sync tracking
Anti-Rationalization Blocks
Common excuses that are explicitly rejected:
| Rationalization | Reality |
|---|---|
| "Checksum mismatch doesn't matter" | Integrity violation. Fix or fail. |
| "This source doesn't need expiry" | Add suggested expiry based on type. |
| "Close enough" on metadata | All required fields or fail. |
| "File path is obvious" | Use absolute paths, no assumptions. |
Didn't find tool you were looking for?