Darwin-Gödel Machine

A cognitive architecture that evolves populations of solutions while formally verifying improvements before self-modification.

Core Philosophy

Darwin: Generate diverse solution populations → Apply selection pressure → Evolve toward optimum Gödel: Verify improvements formally before accepting → Enable recursive self-improvement → Prove modifications beneficial

Combined: Explore solution space evolutionarily, but only commit changes with verification proofs.

THE EXECUTION LOOP

Every problem runs this loop. No exceptions. Depth scales with complexity.

┌─────────────────────────────────────────────────────────────────────────────┐
│  PHASE 1: DECOMPOSE                                                         │
│  ├─ Parse the problem into atomic sub-problems                              │
│  ├─ Identify constraints, success criteria, edge cases                      │
│  ├─ Define fitness function: What makes a solution "better"?                │
│  └─ Estimate complexity class → determines population size & generations    │
├─────────────────────────────────────────────────────────────────────────────┤
│  PHASE 2: GENESIS (Population Initialization)                               │
│  ├─ Generate N diverse initial solutions (N = 3-7 based on complexity)      │
│  ├─ Ensure diversity: different algorithms, paradigms, trade-offs           │
│  ├─ Each solution must be complete and executable (no stubs)                │
│  └─ Tag each with: approach_type, expected_strengths, expected_weaknesses   │
├─────────────────────────────────────────────────────────────────────────────┤
│  PHASE 3: EVALUATE (Fitness Assessment)                                     │
│  ├─ Score each solution against fitness function (1-100)                    │
│  ├─ Test against edge cases and adversarial inputs                          │
│  ├─ Measure: correctness, efficiency, readability, robustness               │
│  └─ Rank population by composite fitness score                              │
├─────────────────────────────────────────────────────────────────────────────┤
│  PHASE 4: EVOLVE (Selection + Mutation + Crossover)                         │
│  ├─ SELECT: Keep top 50% of population                                      │
│  ├─ MUTATE: Apply mutation operators to survivors (see §Mutations)          │
│  ├─ CROSSOVER: Combine strengths of top 2 solutions into hybrid             │
│  └─ Generate new candidates to restore population size                      │
├─────────────────────────────────────────────────────────────────────────────┤
│  PHASE 5: VERIFY (Gödel Proof Gate)                                         │
│  ├─ For each evolved solution, PROVE improvement over parent                │
│  ├─ Proof types: logical deduction, test coverage, complexity analysis      │
│  ├─ REJECT any mutation that cannot be formally justified                   │
│  └─ Only verified improvements pass to next generation                      │
├─────────────────────────────────────────────────────────────────────────────┤
│  PHASE 6: CONVERGE (Termination Check)                                      │
│  ├─ If best solution meets success criteria → DELIVER                       │
│  ├─ If fitness plateau (no improvement in 2 generations) → DELIVER best     │
│  ├─ If generation limit reached → DELIVER best with caveats                 │
│  └─ Else → Return to PHASE 4 with evolved population                        │
├─────────────────────────────────────────────────────────────────────────────┤
│  PHASE 7: REFLECT (Mandatory Self-Reflection)                               │
│  ├─ SOLUTION REFLECTION: Why did winner win? What trait was decisive?       │
│  ├─ PROCESS REFLECTION: Did I explore right space? What did I miss?         │
│  ├─ ASSUMPTION AUDIT: List all assumptions, mark validated/invalidated      │
│  ├─ MUTATION ANALYSIS: Which mutations helped? Which wasted cycles?         │
│  ├─ PROOF QUALITY: Were proofs rigorous or hand-wavy?                       │
│  ├─ FAILURE ANALYSIS: What would have caught mistakes earlier?              │
│  └─ Score reasoning quality 1-10, justify score                             │
├─────────────────────────────────────────────────────────────────────────────┤
│  PHASE 8: META-IMPROVE (Recursive Self-Improvement)                         │
│  ├─ Extract: What lessons apply to future problems?                         │
│  ├─ Propose: Concrete process improvements (not vague)                      │
│  ├─ Verify: Would proposed improvement actually help?                       │
│  ├─ If verified → Add to ACTIVE_LESSONS for this conversation               │
│  └─ Apply ACTIVE_LESSONS at start of next problem in conversation           │
└─────────────────────────────────────────────────────────────────────────────┘

COMPLEXITY SCALING

Problem Type	Population Size	Max Generations	Mutation Rate
Simple (one-liner fix)	3	2	Low
Medium (single function)	5	3	Medium
Complex (module/feature)	7	5	High
Architecture (system design)	7	7	High + Crossover

FITNESS FUNCTION TEMPLATE

Define before generating solutions:

FITNESS(solution) = weighted_sum(
    CORRECTNESS:   Does it produce correct output for all inputs?      (weight: 0.40)
    ROBUSTNESS:    Does it handle edge cases and failures gracefully?  (weight: 0.25)
    EFFICIENCY:    Time/space complexity relative to optimal?          (weight: 0.15)
    READABILITY:   Can a mid-level dev understand it in 30 seconds?    (weight: 0.10)
    EXTENSIBILITY: How hard to modify for likely future requirements?  (weight: 0.10)
)

Adjust weights based on problem priorities. User can override.

MUTATION OPERATORS

Apply during EVOLVE phase to create variants:

Code Mutations

Operator	Description	When to Apply
SIMPLIFY	Remove unnecessary complexity	When solution is >20 lines
GENERALIZE	Make specific code more abstract	When pattern appears 2+ times
SPECIALIZE	Optimize for specific use case	When generality hurts performance
EXTRACT	Pull out reusable component	When code can benefit others
INLINE	Remove unnecessary abstraction	When abstraction adds no value
PARALLELIZE	Add concurrency	When independent operations exist
MEMOIZE	Cache repeated computations	When same inputs recur
GUARD	Add defensive checks	When edge cases discovered

Architecture Mutations

Operator	Description	When to Apply
SPLIT	Decompose into smaller units	When module does too much
MERGE	Combine related components	When separation adds overhead
LAYER	Add abstraction layer	When coupling is too tight
FLATTEN	Remove unnecessary layers	When indirection hurts clarity
ASYNC	Convert to async processing	When blocking is unnecessary
CACHE	Add caching layer	When repeated expensive operations
QUEUE	Add message queue	When decoupling needed
RETRY	Add retry logic	When transient failures possible

ASSUMPTION TRACKING

Track assumptions throughout the ENTIRE loop, not just in reflection.

Assumption Log Format

ASSUMPTION LOG:
┌─────┬─────────────────────────────┬─────────┬──────────┬───────────┐
│ ID  │ Assumption                  │ Phase   │ Risk     │ Status    │
├─────┼─────────────────────────────┼─────────┼──────────┼───────────┤
│ A1  │ Input size < 10,000         │ DECOMP  │ Medium   │ UNCHECKED │
│ A2  │ No concurrent modifications │ GENESIS │ High     │ VALIDATED │
│ A3  │ API returns JSON            │ GENESIS │ Low      │ UNCHECKED │
│ A4  │ O(n²) acceptable for N<100  │ EVOLVE  │ Medium   │ VALIDATED │
└─────┴─────────────────────────────┴─────────┴──────────┴───────────┘

Assumption Risk Levels

Risk	Definition	Action Required
HIGH	If wrong, solution is fundamentally broken	MUST validate before delivery
MEDIUM	If wrong, solution degrades but works	SHOULD validate, document if not
LOW	If wrong, minor impact	Document, validate if easy

RULE: HIGH risk + Weak validation = STOP. Get stronger validation or flag uncertainty.

REFLECTION OUTPUT FORMAT

### REFLECTION (Phase 7)

#### Solution Analysis
- Winner: [ID] 
- Decisive trait: [what made it win]
- Emerged at: [Genesis / Generation N via mutation X]
- Biggest weakness: [trade-off accepted]

#### Process Analysis  
- Approaches NOT tried: [list 2-3 with reasons]
- Highest effort area: [phase/activity] — Justified: [yes/no]
- If starting over: [what would change]

#### Assumption Audit
| Assumption | Risk | Status | Evidence |
|------------|------|--------|----------|
| ... | ... | ... | ... |

Unvalidated HIGH-risk assumptions: [count] ← MUST BE 0

#### Self-Score: [1-10]
Justification: [why this score]

QUICK-START HEURISTICS

For rapid application without full formalism:

When time-constrained:

Generate 3 solutions (diverse approaches)
Score each on correctness + robustness only
Mutate top 1 solution once
Verify mutation improves fitness
Deliver best

When quality is paramount:

Full 7-solution population
5+ generations with crossover
All proof types required
Meta-improvement phase mandatory

ADVERSARIAL SELF-CHECK

Before delivering final solution, ask:

"What input would break this?"
"What assumption am I making that might be wrong?"
"If I had to attack this code, how would I?"
"What would a senior engineer critique?"
"Does the simplest version of this work just as well?"

If any answer reveals a flaw → one more evolution cycle.

PROJECT-SPECIFIC CONTEXT

When working on this Twilio Bulk Lookup codebase, consider these fitness criteria:

For Sidekiq Jobs:

Idempotency (can safely retry)
Rate limit handling
Error classification (retryable vs fatal)
Memory efficiency for large batches

For API Integrations:

Graceful degradation when API unavailable
Credential security
Response caching where appropriate
Webhook reliability

For Rails Models:

Query efficiency (N+1 prevention)
Validation completeness
Scope composability
Serialization safety

Search AI Tools

darwin-godwin-machine

Install this agent skill to your Project

SKILL.md