Agent skill

image-generation

Generate images using Google Gemini (gemini-3-pro-image-preview). Requires GEMINI_API_KEY.

Stars 157
Forks 17

Install this agent skill to your Project

npx add-skill https://github.com/SawyerHood/middleman/tree/main/apps/backend/src/swarm/skills/builtins/image-generation

SKILL.md

Image Generation

Generate images using Google Gemini (gemini-3-pro-image-preview) for both text-to-image and image-to-image workflows.

Use the packaged CLI:

bash
middleman image generate \
  --prompt "a cute robot bee in a garden" \
  --output "/path/to/output.png"

Image-to-image generation is supported with repeated --input-image flags:

bash
middleman image generate \
  --prompt "turn this sketch into a painted poster with a limited teal and coral palette" \
  --input-image "/path/to/sketch.png" \
  --input-image "/path/to/reference.jpg" \
  --output "/path/to/output.png"

Options

  • --prompt (required): text description of the image to generate
  • --output (required): output file path (extension auto-detected when omitted)
  • --input-image (optional, repeatable): local source image(s) to send alongside the prompt
  • --aspect-ratio (optional): aspect ratio like 16:9, 1:1, 4:3
  • --size (optional): image size, default 1K

Output

The script prints JSON:

  • Success: { "ok": true, "file": "/path/to/output.png", "mimeType": "image/png" }
  • Failure: { "ok": false, "error": "..." }

Expand your agent's capabilities with these related and highly-rated skills.

SawyerHood/middleman

brave-search

Web search and content extraction via Brave Search API. Use for searching documentation, facts, or any web content. Lightweight, no browser required.

157 17
Explore
SawyerHood/middleman

memory

Update persistent swarm memory in ${SWARM_MEMORY_FILE} when the user explicitly asks to remember, update, or forget durable information.

157 17
Explore
SawyerHood/middleman

cron-scheduling

Create, list, and remove persistent scheduled tasks using cron expressions.

157 17
Explore
SawyerHood/dev-browser

dev-browser

Browser automation with persistent page state. Use when users ask to navigate websites, fill forms, take screenshots, extract web data, test web apps, or automate browser workflows. Trigger phrases include "go to [url]", "click on", "fill out the form", "take a screenshot", "scrape", "automate", "test the website", "log into", or any browser interaction request.

5,065 317
Explore
davila7/claude-code-templates

verl-rl-training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

23,776 2,298
Explore
davila7/claude-code-templates

openrlhf-training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

23,776 2,298
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results