Agent skill

invoking-gemini

Invokes Google Gemini models for structured outputs, multi-modal tasks, and Google-specific features. Use when users request Gemini, structured JSON output, Google API integration, or cost-effective parallel processing.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/invoking-gemini

Metadata

Additional technical details for this skill

version
0.0.3

SKILL.md

Invoking Gemini

Delegate tasks to Google's Gemini models when they offer advantages over Claude.

When to Use Gemini

Structured outputs:

  • JSON Schema validation with property ordering guarantees
  • Pydantic model compliance
  • Strict schema adherence (enum values, required fields)

Cost optimization:

  • Parallel batch processing (Gemini Flash is lightweight)
  • High-volume simple tasks
  • Budget-constrained operations

Google ecosystem:

  • Integration with Google services
  • Vertex AI workflows
  • Google-specific APIs

Multi-modal tasks:

  • Image analysis with JSON output
  • Video processing
  • Audio transcription with structure

Available Models

gemini-2.0-flash-exp (Recommended):

  • Fast, cost-effective
  • Native JSON Schema support
  • Good for structured outputs

gemini-1.5-pro:

  • More capable reasoning
  • Better for complex tasks
  • Higher cost

gemini-1.5-flash:

  • Balanced speed/quality
  • Good for most tasks

See references/models.md for full model details.

Setup

Prerequisites:

  1. Install google-generativeai:

    bash
    uv pip install google-generativeai pydantic
    
  2. Configure API key via project knowledge file:

    Option 1 (recommended): Individual file

    • Create document: GOOGLE_API_KEY.txt
    • Content: Your API key (e.g., AIzaSy...)

    Option 2: Combined file

    • Create document: API_CREDENTIALS.json
    • Content:
      json
      {
        "google_api_key": "AIzaSy..."
      }
      

    Get your API key: https://console.cloud.google.com/apis/credentials

Basic Usage

Import the client:

python
import sys
sys.path.append('/mnt/skills/invoking-gemini/scripts')
from gemini_client import invoke_gemini

# Simple prompt
response = invoke_gemini(
    prompt="Explain quantum computing in 3 bullet points",
    model="gemini-2.0-flash-exp"
)
print(response)

Structured Output

Use Pydantic models for guaranteed JSON Schema compliance:

python
from pydantic import BaseModel, Field
from gemini_client import invoke_with_structured_output

class BookAnalysis(BaseModel):
    title: str
    genre: str = Field(description="Primary genre")
    key_themes: list[str] = Field(max_length=5)
    rating: int = Field(ge=1, le=5)

result = invoke_with_structured_output(
    prompt="Analyze the book '1984' by George Orwell",
    pydantic_model=BookAnalysis
)

# result is a BookAnalysis instance
print(result.title)  # "1984"
print(result.genre)  # "Dystopian Fiction"

Advantages over Claude:

  • Guaranteed property ordering in JSON
  • Strict enum enforcement
  • Native schema validation (no prompt engineering)
  • Lower cost for simple extractions

Parallel Invocation

Process multiple prompts concurrently:

python
from gemini_client import invoke_parallel

prompts = [
    "Summarize the plot of Hamlet",
    "Summarize the plot of Macbeth",
    "Summarize the plot of Othello"
]

results = invoke_parallel(
    prompts=prompts,
    model="gemini-2.0-flash-exp"
)

for prompt, result in zip(prompts, results):
    print(f"Q: {prompt[:30]}...")
    print(f"A: {result[:100]}...\n")

Use cases:

  • Batch classification tasks
  • Data labeling
  • Multiple independent analyses
  • A/B testing prompts

Error Handling

The client handles common errors:

python
from gemini_client import invoke_gemini

response = invoke_gemini(
    prompt="Your prompt here",
    model="gemini-2.0-flash-exp"
)

if response is None:
    print("Error: API call failed")
    # Check project knowledge file for valid google_api_key

Common issues:

  • Missing API key → Add GOOGLE_API_KEY.txt to project knowledge (see Setup above)
  • Invalid model → Raises ValueError
  • Rate limit → Automatically retries with backoff
  • Network error → Returns None after retries

Advanced Features

Custom Generation Config

python
response = invoke_gemini(
    prompt="Write a haiku",
    model="gemini-2.0-flash-exp",
    temperature=0.9,
    max_output_tokens=100,
    top_p=0.95
)

Multi-modal Input

python
# Image analysis with structured output
from pydantic import BaseModel

class ImageDescription(BaseModel):
    objects: list[str]
    scene: str
    colors: list[str]

result = invoke_with_structured_output(
    prompt="Describe this image",
    pydantic_model=ImageDescription,
    image_path="/mnt/user-data/uploads/photo.jpg"
)

See references/advanced.md for more patterns.

Comparison: Gemini vs Claude

Use Gemini when:

  • Structured output is primary goal
  • Cost is a constraint
  • Property ordering matters
  • Batch processing many simple tasks

Use Claude when:

  • Complex reasoning required
  • Long context needed (200K tokens)
  • Code generation quality matters
  • Nuanced instruction following

Use both:

  • Claude for planning/reasoning
  • Gemini for structured extraction
  • Parallel workflows with different strengths

Token Efficiency Pattern

Gemini Flash is cost-effective for sub-tasks:

python
# Claude (you) plans the approach
# Gemini executes structured extractions

data_points = []
for file in uploaded_files:
    # Gemini extracts structured data
    result = invoke_with_structured_output(
        prompt=f"Extract contact info from {file}",
        pydantic_model=ContactInfo
    )
    data_points.append(result)

# Claude synthesizes results
# ... your analysis here ...

Limitations

Not suitable for:

  • Tasks requiring deep reasoning
  • Long context (>1M tokens)
  • Complex code generation
  • Subjective creative writing

Token limits:

  • gemini-2.0-flash-exp: ~1M input tokens
  • gemini-1.5-pro: ~2M input tokens

Rate limits:

  • Vary by API tier
  • Client handles automatic retry

Examples

See references/examples.md for:

  • Data extraction from documents
  • Batch classification
  • Multi-modal analysis
  • Hybrid Claude+Gemini workflows

Troubleshooting

"API key not configured":

  • Add project knowledge file GOOGLE_API_KEY.txt with your API key
  • Or add to API_CREDENTIALS.json: {"google_api_key": "AIzaSy..."}
  • See Setup section above for details

Import errors:

bash
uv pip install google-generativeai pydantic

Schema validation failures:

  • Check Pydantic model definitions
  • Ensure prompt is clear about expected structure
  • Add examples to prompt if needed

Cost Comparison

Approximate pricing (as of 2024):

Gemini 2.0 Flash:

  • Input: $0.15 / 1M tokens
  • Output: $0.60 / 1M tokens

Claude Sonnet:

  • Input: $3.00 / 1M tokens
  • Output: $15.00 / 1M tokens

For 1000 simple extraction tasks (100 tokens each):

  • Gemini Flash: ~$0.10
  • Claude Sonnet: ~$2.00

Strategy: Use Claude for complex reasoning, Gemini for high-volume simple tasks.

Didn't find tool you were looking for?

Be as detailed as possible for better results