Agent skill
confidence_scorer
Install this agent skill to your Project
npx add-skill https://github.com/transilienceai/communitytools/tree/main/projects/pentest/.claude/skills/techstack-identification/confidence_scorer
SKILL.md
Confidence Scorer Skill
Overview
Calculates confidence levels for identified technologies based on signal quantity, quality, source diversity, and corroboration patterns.
Metadata
- Skill ID: confidence_scorer
- Version: 1.0.0
- Category: Correlation
- Phase: 4 (Correlation)
- Agent: correlation_agent
- Execution Mode: Parallel with other correlation skills
Purpose
Assigns High, Medium, or Low confidence levels to each identified technology using quantitative scoring algorithms that consider evidence strength, source diversity, and cross-validation patterns.
Input Requirements
Required Inputs
{
"correlated_technologies": [
{
"technology": "string",
"category": "frontend|backend|infrastructure|security|devops|third_party",
"sources": ["array of source skill names"],
"source_count": "integer",
"agreement_score": "float (0-1)",
"corroborating_signals": [
{
"source": "string",
"signal": "string",
"strength": "strong|medium|weak"
}
]
}
],
"conflicts": [
{
"technology": "string",
"conflict_type": "string",
"severity": "high|medium|low"
}
]
}
Optional Inputs
scoring_weights: Custom weights for scoring criteria (defaults to standard weights)confidence_thresholds: Custom thresholds for High/Medium/Low (defaults to 0.7/0.4)
Operations
Operation: calculate_confidence
Assigns confidence level (High/Medium/Low) to each technology based on multi-factor scoring.
Input Parameters:
correlated_technologies: Technologies with corroboration dataconflicts: Detected conflicts affecting confidence
Scoring Criteria (weighted):
- Source Diversity (30%): Number of independent evidence sources
- Signal Strength (30%): Quality of evidence (explicit vs indirect)
- Agreement Score (20%): Cross-source consensus level
- Evidence Type (15%): Technical signals vs job postings
- Conflict Impact (5%): Presence of contradictory signals
Confidence Thresholds:
- High Confidence: Score ≥ 0.70
- Medium Confidence: Score ≥ 0.40 and < 0.70
- Low Confidence: Score < 0.40
Process:
- Calculate source diversity score (0-1 based on source count)
- Assess signal strength score (0-1 based on evidence quality)
- Factor in agreement score from correlation analysis
- Weight evidence types (technical > job posting)
- Apply conflict penalties if contradictions exist
- Compute weighted total (0-1 scale)
- Map to confidence level (High/Medium/Low)
- Generate explanation for assigned confidence
Output:
{
"technologies_with_confidence": [
{
"technology": "React",
"category": "frontend",
"version": "18.x",
"confidence": "High",
"confidence_score": 0.92,
"confidence_reasoning": {
"source_diversity_score": 1.0,
"source_count": 4,
"signal_strength_score": 0.95,
"agreement_score": 0.95,
"evidence_type_score": 1.0,
"conflict_penalty": 0.0,
"weighted_total": 0.92
},
"evidence_summary": {
"technical_signals": 3,
"job_posting_mentions": 5,
"strong_evidence_count": 3,
"medium_evidence_count": 2,
"weak_evidence_count": 0
}
},
{
"technology": "PostgreSQL",
"category": "backend",
"confidence": "Medium",
"confidence_score": 0.55,
"confidence_reasoning": {
"source_diversity_score": 0.33,
"source_count": 1,
"signal_strength_score": 0.60,
"agreement_score": 0.50,
"evidence_type_score": 0.60,
"conflict_penalty": 0.0,
"weighted_total": 0.55
},
"evidence_summary": {
"technical_signals": 0,
"job_posting_mentions": 8,
"strong_evidence_count": 0,
"medium_evidence_count": 1,
"weak_evidence_count": 0
},
"confidence_limitations": [
"Single source only (job_posting_analysis)",
"No technical signal corroboration",
"Recommend verification via database connection fingerprinting"
]
}
]
}
Operation: generate_confidence_summary
Creates aggregate confidence statistics for the entire report.
Input Parameters:
technologies_with_confidence: All scored technologies
Process:
- Count by confidence level (High/Medium/Low)
- Calculate overall score (weighted average)
- Identify confidence gaps (technologies needing more evidence)
- Generate quality metrics (% high confidence, etc.)
- Provide recommendations for improving confidence
Output:
{
"confidence_summary": {
"high_confidence_count": 45,
"medium_confidence_count": 23,
"low_confidence_count": 8,
"total_technologies": 76,
"overall_confidence_score": 0.78,
"high_confidence_percentage": 59.2,
"quality_rating": "Good",
"by_category": {
"frontend": {
"high": 12,
"medium": 5,
"low": 1,
"avg_score": 0.82
},
"backend": {
"high": 8,
"medium": 9,
"low": 3,
"avg_score": 0.65
},
"infrastructure": {
"high": 15,
"medium": 3,
"low": 2,
"avg_score": 0.85
},
"security": {
"high": 5,
"medium": 2,
"low": 1,
"avg_score": 0.73
},
"devops": {
"high": 3,
"medium": 2,
"low": 1,
"avg_score": 0.68
},
"third_party": {
"high": 2,
"medium": 2,
"low": 0,
"avg_score": 0.75
}
},
"recommendations": [
"23 technologies at medium confidence could be upgraded with additional technical signals",
"8 low confidence findings should be manually verified or removed"
]
}
}
Operation: identify_confidence_gaps
Highlights technologies that need additional evidence to improve confidence.
Input Parameters:
technologies_with_confidence: Scored technologies
Process:
- Identify medium/low confidence technologies
- Analyze missing evidence types (what would help)
- Suggest additional skills to run for validation
- Prioritize by importance (critical tech vs nice-to-have)
- Generate actionable recommendations
Output:
{
"confidence_gaps": [
{
"technology": "Redis",
"current_confidence": "Medium",
"current_score": 0.58,
"missing_evidence_types": [
"technical_signal",
"dns_records",
"http_headers"
],
"recommendations": [
"Check for redis-cli banner via network probing (if authorized)",
"Search for Redis-specific error messages in HTTP responses",
"Look for redis:// connection strings in public repositories"
],
"potential_score_improvement": 0.25,
"priority": "medium"
}
]
}
Scoring Algorithm Details
Source Diversity Score
score = min(source_count / 4, 1.0)
Rationale:
- 1 source = 0.25
- 2 sources = 0.50
- 3 sources = 0.75
- 4+ sources = 1.00
Signal Strength Score
strong_signals_weight = 1.0
medium_signals_weight = 0.6
weak_signals_weight = 0.3
score = (strong_count * 1.0 + medium_count * 0.6 + weak_count * 0.3) / total_signals
Evidence Type Score
technical_signal_weight = 1.0
job_posting_weight = 0.5
archive_data_weight = 0.4
score = (technical_count * 1.0 + job_count * 0.5 + archive_count * 0.4) / total_evidence
Conflict Penalty
high_severity_conflict = -0.30
medium_severity_conflict = -0.15
low_severity_conflict = -0.05
Applied to final score if conflicts exist for this technology
Overall Confidence Score
confidence_score = (
source_diversity_score * 0.30 +
signal_strength_score * 0.30 +
agreement_score * 0.20 +
evidence_type_score * 0.15 +
(1.0 - conflict_penalty) * 0.05
)
Confidence Level Mapping
High Confidence (≥ 0.70)
Criteria (any of):
- 3+ independent evidence sources with strong signals
- Explicit identifier found (meta tag, header, global variable)
- Job posting mentions + 2+ technical signals
- Direct API/service detection with verification
Interpretation: Very likely to be accurate
Medium Confidence (0.40 - 0.69)
Criteria (typical):
- 1-2 evidence sources with medium/strong signals
- Indirect indicators (URL patterns, cookies, DOM attributes)
- Job posting mentions without technical corroboration
- Single strong technical signal without cross-validation
Interpretation: Plausible but requires additional verification
Low Confidence (< 0.40)
Criteria (typical):
- Single weak signal only
- Speculation based on generic behavior
- Conflicting evidence without clear resolution
- Outdated information without current confirmation
Interpretation: Speculative hypothesis requiring manual validation
Output Format
Success Output
{
"status": "success",
"technologies_with_confidence": [...],
"confidence_summary": {...},
"confidence_gaps": [...],
"statistics": {
"total_technologies_scored": 76,
"high_confidence": 45,
"medium_confidence": 23,
"low_confidence": 8,
"average_confidence_score": 0.78
},
"execution_time_ms": 850
}
Error Output
{
"status": "error",
"error_code": "NO_TECHNOLOGIES",
"error_message": "No technologies provided for confidence scoring",
"partial_results": null
}
Error Handling
No Technologies
- IF technologies array empty → Return error, cannot score
- IF all technologies lack evidence → Assign all Low confidence
Missing Correlation Data
- IF corroboration data missing → Use evidence count only
- IF agreement scores unavailable → Default to 0.5
- IF conflicts data missing → Skip conflict penalties
Invalid Input
- IF technology missing required fields → Skip that technology, continue
- IF confidence threshold invalid → Use defaults (0.7/0.4)
- IF weights don't sum to 1.0 → Normalize weights
Dependencies
Required Skills
- signal_correlator (provides corroboration data)
Required Libraries
- Standard mathematical functions (weighted averages)
- JSON processing
External APIs
- None (pure computation)
Configuration
Settings (from settings.json)
{
"confidence_scoring": {
"high_confidence_threshold": 0.70,
"medium_confidence_threshold": 0.40,
"source_diversity_weight": 0.30,
"signal_strength_weight": 0.30,
"agreement_weight": 0.20,
"evidence_type_weight": 0.15,
"conflict_weight": 0.05,
"min_sources_for_high": 3
}
}
Usage Example
{
"operation": "calculate_confidence",
"inputs": {
"correlated_technologies": [ /* from signal_correlator */ ],
"conflicts": [ /* from signal_correlator */ ]
}
}
Best Practices
- Be conservative - When uncertain, assign lower confidence
- Document reasoning - Always explain confidence score
- Highlight limitations - Call out single-source dependencies
- Provide recommendations - Suggest how to improve confidence
- Recalculate after edits - Update scores when evidence added
Calibration Guidelines
Calibrate Against Known Examples
- Test on known tech stacks to validate scoring accuracy
- Adjust weights if consistently over/under-confident
- Review false positives and adjust signal strength weights
- Validate thresholds (0.7/0.4) against real-world accuracy
Quality Assurance
- Target distribution: 50-70% High, 20-35% Medium, <15% Low
- IF too many Low: Loosen thresholds or improve collection
- IF too many High: Tighten thresholds or strengthen criteria
- Monitor accuracy: Track user feedback on confidence levels
Limitations
- Confidence ≠ certainty (probabilistic estimates only)
- Context-dependent (job posting weight varies by company size)
- Temporal sensitivity (recent evidence > old evidence)
- Cannot detect all false positives (some will have high confidence)
Security Considerations
- No external requests (pure computation)
- Sanitize output (remove sensitive paths/URLs)
- Log scoring decisions for audit trail
- Preserve evidence for forensic review
Version History
- 1.0.0 (2024-01-20): Initial implementation with multi-factor scoring
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
techstack-identification
OSINT-based technology stack identification. Discovers company tech stacks using passive reconnaissance across 17 intelligence domains. Given a company name (and optional domain hint), infers frontend, backend, infrastructure, and security technologies using publicly available signals.
conflict_resolver
web-archive-analysis
Uses Wayback Machine to detect technology migrations over time
evidence_formatter
signal_correlator
dns-intelligence
Extracts technology signals from DNS records (MX, TXT, NS, CNAME, SRV)
Didn't find tool you were looking for?