Agent skill

gemini-image

Analyze images using Gemini's vision capabilities. Use for image analysis, text extraction from screenshots, and visual content understanding.

View SKILL.md on GitHub Repository

Stars 21

Forks 5

Install this agent skill to your Project

npx add-skill https://github.com/johnlindquist/claude/tree/main/skills/gemini-image

SKILL.md

Gemini Image Analysis

Analyze images using Gemini Pro's vision capabilities.

Prerequisites

bash

pip install google-generativeai
export GEMINI_API_KEY=your_api_key

CLI Reference

Basic Image Analysis

bash

# Analyze an image
gemini -m pro -f /path/to/image.png "Describe this image in detail"

# With specific question
gemini -m pro -f screenshot.png "What error message is shown?"

# Multiple images
gemini -m pro -f image1.png -f image2.png "Compare these two images"

Analysis Operations

General Description

bash

gemini -m pro -f image.png "Describe this image comprehensively:
1. Main subject/content
2. Colors and composition
3. Text visible (if any)
4. Context and purpose
5. Notable details"

Extract Text (OCR)

bash

gemini -m pro -f screenshot.png "Extract all text from this image.
Format as plain text, preserving layout where possible.
Include any text in buttons, labels, or UI elements."

Code from Screenshot

bash

gemini -m pro -f code-screenshot.png "Extract the code from this screenshot.
Provide as properly formatted code with correct indentation.
Note any parts that are unclear or partially visible."

UI Analysis

bash

gemini -m pro -f ui-screenshot.png "Analyze this UI:
1. What application/website is this?
2. What page/screen is shown?
3. Main UI elements and their purpose
4. User flow/actions available
5. Any UX issues or suggestions"

Error Analysis

bash

gemini -m pro -f error-screenshot.png "Analyze this error:
1. What error is shown?
2. What is the likely cause?
3. How to fix it?
4. Any related information visible?"

Diagram Understanding

bash

gemini -m pro -f diagram.png "Explain this diagram:
1. What type of diagram is this?
2. Main components and their relationships
3. Data/process flow
4. Key takeaways"

Specific Use Cases

Debug Screenshot

bash

gemini -m pro -f debug-screen.png "I'm debugging an issue. From this screenshot:
1. What is the current state?
2. What errors or warnings are visible?
3. What should I look at?
4. Suggested next steps"

Compare Before/After

bash

gemini -m pro -f before.png -f after.png "Compare these before and after images:
1. What changed?
2. Is this an improvement?
3. Any issues in the 'after' version?
4. Anything missing?"

Design Feedback

bash

gemini -m pro -f design.png "Provide design feedback:
1. Visual hierarchy
2. Color usage
3. Typography
4. Spacing and alignment
5. Accessibility concerns
6. Suggestions for improvement"

Data Extraction

bash

gemini -m pro -f chart.png "Extract data from this chart:
1. Chart type
2. Data series and values
3. Axes labels and ranges
4. Key trends or insights
5. Output as structured data if possible"

Form Analysis

bash

gemini -m pro -f form.png "Analyze this form:
1. Form purpose
2. Fields and their types
3. Required vs optional
4. Validation rules visible
5. UX suggestions"

Workflow Patterns

Screenshot to Issue

bash

# Capture screenshot (macOS)
screencapture -i /tmp/bug.png

# Analyze and format as issue
gemini -m pro -f /tmp/bug.png "Create a bug report from this screenshot:

## Summary
[One-line description]

## Steps to Reproduce
[Inferred from screenshot]

## Expected Behavior
[What should happen]

## Actual Behavior
[What the screenshot shows]

## Environment
[Any visible system info]"

UI to Code

bash

gemini -m pro -f ui-design.png "Generate React component code that recreates this UI:
- Use Tailwind CSS for styling
- Make it responsive
- Include proper TypeScript types
- Add appropriate accessibility attributes"

Documentation

bash

gemini -m pro -f app-screen.png "Write user documentation for this screen:
- What this screen is for
- How to use each feature
- Common tasks
- Tips and notes"

Image Types Supported

PNG, JPEG, GIF, WebP
Screenshots
Photos
Diagrams and charts
UI mockups
Code snippets
Documents

Best Practices

Use clear images - Higher quality = better analysis
Crop to relevant area - Remove unnecessary context
Ask specific questions - Vague prompts get vague answers
Provide context - Tell Gemini what you're looking for
Verify extracted text - OCR isn't perfect
Multiple angles - Use multiple images for complex subjects

Maintainer

johnlindquist Core maintainer

Source details

Full Name: johnlindquist/claude
Branch: main
Path in repo: skills/gemini-image

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

johnlindquist/claude

testgen

Generate tests using AI and run test suites. Use for generating unit tests, running coverage reports, and mutation testing.

21 5

Explore

johnlindquist/claude

article

Generate technical articles and documentation using AI. Use for writing blog posts, documentation, and technical content.

21 5

Explore

johnlindquist/claude

packx

Bundle code context for AI. ALWAYS use --limit 49k unless user explicitly requests otherwise. Use for creating shareable code bundles and preparing context for LLMs.

21 5

Explore

johnlindquist/claude

long-agent

Manage long-running agent sessions. Use for tracking progress in extended tasks, maintaining context across long sessions, and managing multi-step workflows.

21 5

Explore

johnlindquist/claude

db

Database operations for SQLite, PostgreSQL, and MySQL. Use for queries, schema inspection, migrations, and AI-assisted query generation.

21 5

Explore

johnlindquist/claude

investigate

Debug and investigate code issues using search and AI analysis. Use when stuck on bugs, tracing execution flow, or understanding complex code.

21 5

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Gemini Image Analysis

Prerequisites

CLI Reference

Basic Image Analysis

Analysis Operations

General Description

Extract Text (OCR)

Code from Screenshot

UI Analysis

Error Analysis

Diagram Understanding

Specific Use Cases

Debug Screenshot

Compare Before/After

Design Feedback

Data Extraction

Form Analysis

Workflow Patterns

Screenshot to Issue

UI to Code

Documentation

Image Types Supported

Best Practices

Recommended Agent Skills

testgen

article

packx

long-agent

db

investigate