Agent skill

image-generator

Generate professional visuals using Gemini via browser automation with 6-gate quality control. Use when creating chapter illustrations, diagrams, or teaching visuals. NOT for stock photos or decorative images.

Stars 232
Forks 15

Install this agent skill to your Project

npx add-skill https://github.com/aiskillstore/marketplace/tree/main/skills/92bilal26/image-generator

SKILL.md

Image Generator

Generate professional teaching visuals using Gemini 3 with multi-turn reasoning partnership.

Quick Start

bash
# 1. Start browser (via browser-use skill)
bash .claude/skills/browser-use/scripts/start-server.sh

# 2. Navigate to Gemini
# Use browser_navigate to https://gemini.google.com/

# 3. Generate image from creative brief
# Paste creative brief → Wait 30-35s → Verify 6 gates → Download

Core Principles

  1. Reasoning over prediction - Creative briefs (Story/Intent/Metaphor) activate reasoning; pixel specs don't
  2. Multi-turn partnership - Teach Gemini your standards through principle-based feedback
  3. 6-gate quality - Explicit pass/fail before download
  4. Autonomous batch - No permission-asking between visuals

Input: Creative Brief Format

Receive from visual-asset-workflow:

markdown
## The Story
[Narrative about what's visualized]

## Emotional Intent
[What it should FEEL like]

## Visual Metaphor
[Universal concept for instant comprehension]

## Subject / Composition / Action / Location / Style
[Gemini 3 prompt structure]

## Color Semantics
Blue (#2563eb) = Authority | Green (#10b981) = Execution

## Typography Hierarchy
Largest: Key insight | Medium: Supporting | Smallest: Context

Do NOT convert to pixel specs - use as-is to activate reasoning.

Workflow (Per Visual)

Step Action Tool
1 Navigate to gemini.google.com browser_navigate
2 Select "🍌 Create Image" browser_click
3 Paste creative brief browser_type
4 Wait 30-35 seconds browser_wait_for
5 Verify 6 gates (below) Visual inspection
6 If fail: Iterate with feedback (max 3) browser_type
7 If pass: Download full size browser_click
8 Copy to apps/learn-app/static/img/part-{N}/chapter-{NN}/ Bash
9 Embed in lesson immediately Edit
10 NEW CHAT for next visual browser_navigate

Quality Gates (ALL Must Pass)

Gate Criterion Fail Action
1. Spelling 99% accuracy (Y-Combinator, Kubernetes) Iterate
2. Layout Proportions match prompt (2×2 not 3×1) Iterate
3. Color Brand colors match (#2563eb not #002050) Iterate
4. Typography Largest = key concept (not decoration) Iterate
5. Teaching <5 sec concept grasp at target proficiency Iterate
6. Uniqueness Not duplicate of existing chapter image New chat

Decision: ALL pass → Download | ANY fail → Iterate (max 3 tries)

Iteration: Principle-Based Feedback

When gate fails, provide teaching feedback:

Gate 4 FAILED: Typography hierarchy incorrect

The largest text is "$100K" (supporting detail) but should be "$3T"
(key insight students must grasp).

Increase '$3T' to dominant size. Reduce '$100K' to supporting size.
Information importance drives sizing.

Batch Mode

When invoked with "generate all visuals":

For EACH visual in list:
  A. NEW CHAT (context isolation)
  B. Generate (paste brief)
  C. Verify 6 gates
  D. Iterate if needed (max 3)
  E. Download when pass
  F. Embed in lesson
  G. Log "✅ N/M"
  H. NEXT (no stopping)

Never ask: "Continue?" "Pause here?" "Review?"

Report at END only:

BATCH COMPLETE
✅ Generated: 16/18
⚠️ Deferred: 2 (quality issues)
Location: apps/learn-app/static/img/part-{N}/

Proficiency Limits

Level Max Elements Grasp Time
A2 5-7 <5 sec
B1 7-10 <10 sec
C2 No limit N/A

Token Conservation (Batch Mode)

For >8 visuals, condense briefs:

Original (250 tokens):

"Top Layer shows Coordinator at center top with label 'Orchestrator'
featuring conductor icon, with role 'Strategic oversight'..."

Condensed (80 tokens):

"Top Layer - Coordinator: Center top, 'Orchestrator' (conductor),
Role: 'Strategic oversight', Gold (#fbbf24), Large hexagon."

Keep: Story, Intent, Metaphor, Colors, Reasoning Condense: Long examples → Short labels

Anti-Patterns

Don't Why
Accept first output without 6 gates Quality standard violation
Ask permission between batch items Breaks autonomous agency
Convert briefs to pixel specs Defeats reasoning activation
Skip embedding step Creates orphan images
Reuse same chat for next visual Context contamination

Session Interruption

If session ends mid-batch, create checkpoint:

markdown
# Checkpoint: Part {N}
Status: INTERRUPTED at 8/18

## Completed:
- ✅ Image 1: filename (embedded lesson-01.md)
- ✅ Image 2: filename (embedded lesson-02.md)

## Remaining:
- ⏳ Image 8: filename

On continuation: Read checkpoint → Resume → Update incrementally

Success Indicators

  • ✅ All 6 gates verified before download
  • ✅ Batch completion without permission-asking
  • ✅ Principle-based iteration feedback
  • ✅ Images organized by part/chapter
  • ✅ Immediate embedding (no orphans)
  • ✅ >85% production-ready rate

Didn't find tool you were looking for?

Be as detailed as possible for better results