Agent skill

image-generator

Generate professional visuals using Gemini via browser automation with 6-gate quality control. Use when creating chapter illustrations, diagrams, or teaching visuals. NOT for stock photos or decorative images.

View SKILL.md on GitHub Repository

Stars 232

Forks 15

Install this agent skill to your Project

npx add-skill https://github.com/aiskillstore/marketplace/tree/main/skills/92bilal26/image-generator

SKILL.md

Image Generator

Generate professional teaching visuals using Gemini 3 with multi-turn reasoning partnership.

Quick Start

bash

# 1. Start browser (via browser-use skill)
bash .claude/skills/browser-use/scripts/start-server.sh

# 2. Navigate to Gemini
# Use browser_navigate to https://gemini.google.com/

# 3. Generate image from creative brief
# Paste creative brief → Wait 30-35s → Verify 6 gates → Download

Core Principles

Reasoning over prediction - Creative briefs (Story/Intent/Metaphor) activate reasoning; pixel specs don't
Multi-turn partnership - Teach Gemini your standards through principle-based feedback
6-gate quality - Explicit pass/fail before download
Autonomous batch - No permission-asking between visuals

Input: Creative Brief Format

Receive from visual-asset-workflow:

markdown

## The Story
[Narrative about what's visualized]

## Emotional Intent
[What it should FEEL like]

## Visual Metaphor
[Universal concept for instant comprehension]

## Subject / Composition / Action / Location / Style
[Gemini 3 prompt structure]

## Color Semantics
Blue (#2563eb) = Authority | Green (#10b981) = Execution

## Typography Hierarchy
Largest: Key insight | Medium: Supporting | Smallest: Context

Do NOT convert to pixel specs - use as-is to activate reasoning.

Workflow (Per Visual)

Step	Action	Tool
1	Navigate to gemini.google.com	browser_navigate
2	Select "🍌 Create Image"	browser_click
3	Paste creative brief	browser_type
4	Wait 30-35 seconds	browser_wait_for
5	Verify 6 gates (below)	Visual inspection
6	If fail: Iterate with feedback (max 3)	browser_type
7	If pass: Download full size	browser_click
8	Copy to `apps/learn-app/static/img/part-{N}/chapter-{NN}/`	Bash
9	Embed in lesson immediately	Edit
10	NEW CHAT for next visual	browser_navigate

Quality Gates (ALL Must Pass)

Gate	Criterion	Fail Action
1. Spelling	99% accuracy (Y-Combinator, Kubernetes)	Iterate
2. Layout	Proportions match prompt (2×2 not 3×1)	Iterate
3. Color	Brand colors match (#2563eb not #002050)	Iterate
4. Typography	Largest = key concept (not decoration)	Iterate
5. Teaching	<5 sec concept grasp at target proficiency	Iterate
6. Uniqueness	Not duplicate of existing chapter image	New chat

Decision: ALL pass → Download | ANY fail → Iterate (max 3 tries)

Iteration: Principle-Based Feedback

When gate fails, provide teaching feedback:

Gate 4 FAILED: Typography hierarchy incorrect

The largest text is "$100K" (supporting detail) but should be "$3T"
(key insight students must grasp).

Increase '$3T' to dominant size. Reduce '$100K' to supporting size.
Information importance drives sizing.

Batch Mode

When invoked with "generate all visuals":

For EACH visual in list:
  A. NEW CHAT (context isolation)
  B. Generate (paste brief)
  C. Verify 6 gates
  D. Iterate if needed (max 3)
  E. Download when pass
  F. Embed in lesson
  G. Log "✅ N/M"
  H. NEXT (no stopping)

Never ask: "Continue?" "Pause here?" "Review?"

Report at END only:

BATCH COMPLETE
✅ Generated: 16/18
⚠️ Deferred: 2 (quality issues)
Location: apps/learn-app/static/img/part-{N}/

Proficiency Limits

Level	Max Elements	Grasp Time
A2	5-7	<5 sec
B1	7-10	<10 sec
C2	No limit	N/A

Token Conservation (Batch Mode)

For >8 visuals, condense briefs:

Original (250 tokens):

"Top Layer shows Coordinator at center top with label 'Orchestrator'
featuring conductor icon, with role 'Strategic oversight'..."

Condensed (80 tokens):

"Top Layer - Coordinator: Center top, 'Orchestrator' (conductor),
Role: 'Strategic oversight', Gold (#fbbf24), Large hexagon."

Keep: Story, Intent, Metaphor, Colors, Reasoning Condense: Long examples → Short labels

Anti-Patterns

Don't	Why
Accept first output without 6 gates	Quality standard violation
Ask permission between batch items	Breaks autonomous agency
Convert briefs to pixel specs	Defeats reasoning activation
Skip embedding step	Creates orphan images
Reuse same chat for next visual	Context contamination

Session Interruption

If session ends mid-batch, create checkpoint:

markdown

# Checkpoint: Part {N}
Status: INTERRUPTED at 8/18

## Completed:
- ✅ Image 1: filename (embedded lesson-01.md)
- ✅ Image 2: filename (embedded lesson-02.md)

## Remaining:
- ⏳ Image 8: filename

On continuation: Read checkpoint → Resume → Update incrementally

Success Indicators

✅ All 6 gates verified before download
✅ Batch completion without permission-asking
✅ Principle-based iteration feedback
✅ Images organized by part/chapter
✅ Immediate embedding (no orphans)
✅ >85% production-ready rate

Maintainer

aiskillstore Core maintainer

Source details

Full Name: aiskillstore/marketplace
Branch: main
Path in repo: skills/92bilal26/image-generator
Topics: claude-code claude codex-skills skills codex claude-skills ai-skills

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

aiskillstore/marketplace

perigon-backend

Perigon ASP.NET Core + EF Core + Aspire conventions

232 15

Explore

aiskillstore/marketplace

perigon-agent

Pointers for Copilot/agents to apply Perigon conventions

232 15

Explore

aiskillstore/marketplace

perigon-angular

Angular 21+ standalone/Material/signal conventions for Perigon WebApp

232 15

Explore

aiskillstore/marketplace

fastapi-mastery

Comprehensive FastAPI development skill covering REST API creation, routing, request/response handling, validation, authentication, database integration, middleware, and deployment. Use when working with FastAPI projects, building APIs, implementing CRUD operations, setting up authentication/authorization, integrating databases (SQL/NoSQL), adding middleware, handling WebSockets, or deploying FastAPI applications. Triggered by requests involving .py files with FastAPI code, API endpoint creation, Pydantic models, or FastAPI-specific features.

232 15

Explore

aiskillstore/marketplace

context7-efficient

Token-efficient library documentation fetcher using Context7 MCP with 86.8% token savings through intelligent shell pipeline filtering. Fetches code examples, API references, and best practices for JavaScript, Python, Go, Rust, and other libraries. Use when users ask about library documentation, need code examples, want API usage patterns, are learning a new framework, need syntax reference, or troubleshooting with library-specific information. Triggers include questions like "Show me React hooks", "How do I use Prisma", "What's the Next.js routing syntax", or any request for library/framework documentation.

232 15

Explore

aiskillstore/marketplace

browser-use

Browser automation using Playwright MCP. Navigate websites, fill forms, click elements, take screenshots, and extract data. Use when tasks require web browsing, form submission, web scraping, UI testing, or any browser interaction.

232 15

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Image Generator

Quick Start

Core Principles

Input: Creative Brief Format

Workflow (Per Visual)

Quality Gates (ALL Must Pass)

Iteration: Principle-Based Feedback

Batch Mode

Proficiency Limits

Token Conservation (Batch Mode)

Anti-Patterns

Session Interruption

Success Indicators

Recommended Agent Skills

perigon-backend

perigon-agent

perigon-angular

fastapi-mastery

context7-efficient

browser-use