Agent skill
dark-intelligence-workflow-step-1-identify
Sub-skill of dark-intelligence-workflow: Step 1 — Identify (+4).
Install this agent skill to your Project
npx add-skill https://github.com/vamseeachanta/workspace-hub/tree/main/.claude/skills/_archive/data/dark-intelligence-workflow/step-1-identify
SKILL.md
Step 1 — Identify (+4)
Step 1 — Identify
Locate the Excel/file containing engineering calculations.
What to look for:
- Formulas (cell formulas, array formulas)
- Named ranges (often contain key parameters)
- VBA macros (may contain iterative solvers or logic)
- Validation/check sheets (comparison against known answers)
- Input sheets with units and descriptions
- README or documentation tabs
Check doc index for the file:
uv run --no-project python -c "
import json
matches = []
with open('data/document-index/index.jsonl') as f:
for line in f:
rec = json.loads(line)
path_lower = rec.get('path', '').lower()
if '<filename>' in path_lower or '<category>' in path_lower:
matches.append(rec)
print(f'Found {len(matches)} matching documents')
for m in matches[:20]:
print(f\" {m.get('source', '?'):15s} {m.get('path', '')[:80]}\")
"
Output: file path, description of what the spreadsheet calculates, list of tabs/sheets.
Step 2 — Extract
Pull out generic methodology from the file. Extract each of these:
| Item | What to capture |
|---|---|
| Equations | Convert Excel formulas to LaTeX notation |
| Input ranges | Parameter names, symbols, units, typical value ranges |
| Output ranges | Result names, symbols, units, expected values for test cases |
| Standard references | Any codes/standards cited (API, DNV, ISO, ASME, etc.) |
| Methodology notes | Documentation within the file, assumptions, limitations |
| Unit systems | SI, Imperial, or mixed — note conversions used |
| Worked examples | Complete input-output pairs with known-correct answers |
Tips for Excel formula extraction:
=formulas: translate operators directly to math notation- Named ranges: map to variable names in the archive
IF/AND/OR: translate to conditional logic descriptionsVLOOKUP/INDEX/MATCH: identify the lookup table data- Array formulas (
Ctrl+Shift+Enter): note array dimensions - VBA
Function: extract algorithm as pseudocode
Step 3 — Sanitize (HARD GATE)
This step is non-negotiable. Extraction cannot proceed without passing.
Run the legal sanity scan on all extracted content:
bash scripts/legal/legal-sanity-scan.sh
Check for and remove:
- Client names, project names, project numbers
- Proprietary labels and internal codenames
- Client infrastructure identifiers (field names, platform names)
- Client-specific file paths or network locations
- Employee names (other than yourself for academic work)
If ANY block-severity violations are found: STOP. Remediate all violations before proceeding to Step 4.
Replace all client-specific references with generic equivalents:
- Project names -> generic descriptive names (e.g. "example_platform")
- Field names -> "field_A", "field_B" or generic descriptions
- Client tool names -> generic equivalents
Step 4 — Archive
Save extracted methodology as structured YAML.
Location: knowledge/dark-intelligence/<category>/<subcategory>/
Filename: dark-intelligence-<descriptive-name>.yaml
Schema:
# dark-intelligence-<name>.yaml
source_type: "excel|python|matlab|fortran"
source_description: "Generic description of what this calculates (no client refs)"
extracted_date: "YYYY-MM-DD"
legal_scan_passed: true
category: "<engineering category>"
subcategory: "<specific topic>"
equations:
- name: "<equation name>"
latex: "<LaTeX formula>"
excel_formula: "<original Excel formula, sanitized>"
standard: "<standard reference if any>"
description: "<what it computes>"
inputs:
- name: "<input name>"
symbol: "<LaTeX symbol>"
unit: "<unit>"
typical_range: [min, max]
test_value: <value for TDD>
outputs:
- name: "<output name>"
symbol: "<LaTeX symbol>"
unit: "<unit>"
test_expected: <expected value for TDD>
tolerance: <acceptable error>
worked_examples:
- description: "<example problem statement>"
inputs: {key: value}
outputs: {key: value}
use_as_test: true
assumptions:
- "<assumption 1>"
- "<assumption 2>"
references:
- "<standard or textbook reference>"
notes: "<any methodology notes, limitations, applicability>"
Validation: ensure legal_scan_passed: true is present and all fields
use generic descriptions free of client identifiers.
Step 5 — Generate TDD Test Data
Convert each worked example from the archive into a pytest test function.
Template:
def test_<calc_name>_from_dark_intelligence():
"""Extracted from legacy calculation — verified against original output."""
# Arrange — inputs from archive
<input_name> = <test_value>
# Act — call the implementation
result = <function>(<inputs>)
# Assert — expected output from archive
assert abs(result - <expected>) < <tolerance>, (
f"Expected {<expected>}, got {result}"
)
Rules:
- One test per worked example where
use_as_test: true - Use
tolerancefrom the archive for floating-point comparisons - Include the source description in the docstring
- Tests MUST fail initially (Red phase of TDD)
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
gsd-complete-milestone
Archive completed milestone and prepare for next version
gsd-reapply-patches
Reapply local modifications after a GSD update
gsd-verify-work
Validate built features through conversational UAT
gsd-thread
Manage persistent context threads for cross-session work
clinical-trial-protocol
Generate clinical trial protocols for medical devices or drugs through a modular, waypoint-based architecture with research-only and full protocol modes.
single-cell-rna-qc
Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations.
Didn't find tool you were looking for?