Agent skill
image-generation
Generate images using Google Gemini (gemini-3-pro-image-preview). Requires GEMINI_API_KEY.
Install this agent skill to your Project
npx add-skill https://github.com/SawyerHood/middleman/tree/main/apps/backend/src/swarm/skills/builtins/image-generation
SKILL.md
Image Generation
Generate images using Google Gemini (gemini-3-pro-image-preview) for both text-to-image and image-to-image workflows.
Use the packaged CLI:
middleman image generate \
--prompt "a cute robot bee in a garden" \
--output "/path/to/output.png"
Image-to-image generation is supported with repeated --input-image flags:
middleman image generate \
--prompt "turn this sketch into a painted poster with a limited teal and coral palette" \
--input-image "/path/to/sketch.png" \
--input-image "/path/to/reference.jpg" \
--output "/path/to/output.png"
Options
--prompt(required): text description of the image to generate--output(required): output file path (extension auto-detected when omitted)--input-image(optional, repeatable): local source image(s) to send alongside the prompt--aspect-ratio(optional): aspect ratio like16:9,1:1,4:3--size(optional): image size, default1K
Output
The script prints JSON:
- Success:
{ "ok": true, "file": "/path/to/output.png", "mimeType": "image/png" } - Failure:
{ "ok": false, "error": "..." }
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
brave-search
Web search and content extraction via Brave Search API. Use for searching documentation, facts, or any web content. Lightweight, no browser required.
memory
Update persistent swarm memory in ${SWARM_MEMORY_FILE} when the user explicitly asks to remember, update, or forget durable information.
cron-scheduling
Create, list, and remove persistent scheduled tasks using cron expressions.
dev-browser
Browser automation with persistent page state. Use when users ask to navigate websites, fill forms, take screenshots, extract web data, test web apps, or automate browser workflows. Trigger phrases include "go to [url]", "click on", "fill out the form", "take a screenshot", "scrape", "automate", "test the website", "log into", or any browser interaction request.
verl-rl-training
Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.
openrlhf-training
High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.
Didn't find tool you were looking for?