Agent skill
gem
Multimodal AI processing using Google Gemini. Use for analyzing PDFs, images, videos, YouTube links, and other large documents. Ideal when you need to extract information from files that require vision or multimodal understanding.
Install this agent skill to your Project
npx add-skill https://github.com/rajshah4/my-agent-skills/tree/main/skills/gem
SKILL.md
Gemini Multimodal Tool
Use the ai-gem CLI tool for multimodal AI processing via Google's Gemini API.
Usage
# Text queries
ai-gem "Write a haiku about Python programming"
# Analyze documents
ai-gem "Summarize this document" document.pdf
# Analyze images
ai-gem "What's in this image?" photo.jpg
# Process YouTube videos
ai-gem "Create a 5-point summary" "https://youtu.be/VIDEO_ID"
# Compare multiple files
ai-gem "Compare these files" file1.pdf file2.png
# Web search
ai-gem "Current AI news" --search
Requirements
GEMINI_API_KEYenvironment variable must be set- The
hamelpackage must be installed:pip install hamel
Supported Input Types
- PDFs
- Images (PNG, JPEG, GIF, WebP)
- Videos (MP4, etc.)
- YouTube URLs
- Plain text files
- Multiple files for comparison
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
youtube-chapters
Generate chapter summaries with timestamps for YouTube videos using AI. Use when asked to create chapters, summarize video sections, or generate video outline.
kit
Fetch Kit (ConvertKit) newsletter broadcasts for writing context. Use when asked to download newsletters, get past email content for style reference, or fetch broadcasts for analysis.
call-prep-notion
Used for creating FDE Call Prep documents in Notion using the Raj Brief template.
zoom
Download Zoom meeting transcripts. Use when asked to get transcripts from Zoom recordings, download Zoom meeting notes, or fetch Zoom call transcripts.
video-visualizer
annotate-talk
Create annotated blog posts from technical talks with slides. Use when asked to convert a video presentation to a blog post, create written content from a talk, or annotate slides with transcript.
Didn't find tool you were looking for?