Agent skill

gemini-api-guides

Comprehensive reference for Google's Gemini API. Use when building applications with: (1) Gemini models (Gemini 3 Pro, 2.5 Flash/Pro/Flash-Lite) for text and multimodal generation, (2) Image generation (Imagen, Nano Banana), video (Veo 3.1), music (Lyria), (3) Function calling, structured outputs, and agentic workflows, (4) Built-in tools: Google Search, Maps, Code Execution, URL Context, Computer Use, File Search, (5) Live API for real-time voice/video streaming, (6) Long context (1M+ tokens), embeddings, document/audio/video understanding, (7) Batch API, context caching, safety settings. Triggers: "gemini api", "google ai", "genai sdk", "gemini model", "veo", "imagen", "nano banana", "lyria", "live api", "vertex ai"

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/gemini-api-guides

SKILL.md

Gemini API Skill

Build AI applications with Google's Gemini models and tools.

Quick Start

Installation

bash
# Python
pip install google-genai

# JavaScript/Node.js
npm install @google/genai

# Go
go get google.golang.org/genai

Environment Setup

bash
export GEMINI_API_KEY="your-api-key"

Basic Usage

Python:

python
from google import genai

client = genai.Client()
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Your prompt here"
)
print(response.text)

JavaScript:

javascript
import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({});
const response = await ai.models.generateContent({
    model: "gemini-2.5-flash",
    contents: "Your prompt here"
});
console.log(response.text);

REST:

bash
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{"contents": [{"parts": [{"text": "Your prompt here"}]}]}'

Model Selection

Model Best For Context Window
Gemini 3 Pro Most intelligent tasks, multimodal reasoning, agentic See models-overview
Gemini 2.5 Pro Complex reasoning, coding, extended thinking 1M tokens
Gemini 2.5 Flash Balanced performance, general tasks 1M tokens
Gemini 2.5 Flash-Lite High-volume, cost-sensitive, fastest See models-overview
Imagen High-fidelity image generation N/A
Veo 3.1 Video generation (8s, 720p/1080p with audio) N/A
Nano Banana Native image gen with Gemini 2.5 Flash N/A
Nano Banana Pro Native image gen with Gemini 3 Pro N/A

Reference Documentation Index

Getting Started

Topic File Description
Setup & Libraries getting-started.md API keys, SDK installation, OpenAI compatibility

Models & Pricing

Topic File Description
Model Overview models-overview.md All models, capabilities, context windows
Pricing api-pricing.md Token costs, tool pricing
Rate Limits rate-limits.md RPM/TPM limits, quotas
Gemini 3 Guide gemini-3.md Gemini 3 specific features and best practices
Imagen imagen.md Image generation with Imagen model
Embeddings embeddings.md Text embeddings for search/RAG
Veo veo.md Video generation with Veo 3.1 (69K)
Lyria lyria.md Music generation with Lyria RealTime
Robotics robotics.md Gemini Robotics-ER 1.5 (42K)

Core Capabilities

Topic File Description
Text Generation text-generation.md Text generation, system instructions (38K)
Image Gen (Nano Banana) image-generation-gemini.md Native image generation with Gemini (LARGE: 174K)
Image Understanding image-understanding.md Vision, image analysis
Video Understanding video-understanding.md Video analysis, timestamps
Document Understanding document-understanding.md PDF and document processing
Speech Generation speech-generation.md Text-to-speech (TTS)
Audio Understanding audio-understanding.md Audio analysis, transcription

Advanced Features

Topic File Description
Thinking Mode thinking.md Extended reasoning capabilities
Thought Signatures thought-signatures.md EDGE CASE ONLY: Manual signature handling when NOT using official SDKs
Structured Outputs structured-outputs.md JSON schema responses
Function Calling function-calling.md Custom tool integration (54K)
Long Context long-context.md 1M+ token handling, context caching

Tools

Topic File Description
Tools Overview tools-overview.md Built-in tools summary, agent frameworks
Google Search google-search.md Web search grounding
Google Maps google-maps.md Location-aware grounding
Code Execution code-execution.md Python code execution tool
URL Context url-context.md URL content extraction
Computer Use computer-use.md Browser automation (preview) (44K)
File Search file-search.md RAG with document indexing

Live API (Real-time Streaming)

Topic File Description
Getting Started live-api-getting-started.md Low-latency voice/video interactions
Capabilities Guide live-api-capabilities.md Full capabilities and configurations (32K)
Tool Use live-api-tools.md Function calling & Search in Live API
Session Management live-api-sessions.md Session handling, time limits
Ephemeral Tokens ephemeral-tokens.md Short-lived auth for client-side WebSockets

Guides

Topic File Description
Batch API batch-api.md Async processing at 50% cost (47K)
Files API files-api.md Upload and manage media files (49K)
Context Caching context-caching.md Implicit & explicit caching for cost savings
Media Resolution media-resolution.md Control token allocation for media
Tokens tokens.md Understand and count tokens
Prompt Design prompt-design.md Prompt strategies and best practices (47K)
Logs & Datasets logs-datasets.md Enable logging, view in AI Studio
Data Logging & Sharing data-logging-sharing.md Storage and management of API logs
Safety Settings safety-settings.md Adjust safety filters
Safety Guidance safety-guidance.md Best practices for safe AI use

Troubleshooting & Migration

Topic File Description
Troubleshooting troubleshooting.md Diagnose and resolve common API issues (25K)
Vertex AI Comparison vertex-ai-comparison.md READ ONLY IF USER MENTIONS "VERTEX AI": Gemini Developer API vs Vertex AI differences

Large Files - Search Patterns

For large reference files (>30K), use grep to find specific sections:

image-generation-gemini.md (174K):

bash
grep -n "## " references/image-generation-gemini.md  # List sections
grep -n "edit" references/image-generation-gemini.md  # Find editing info
grep -n "style" references/image-generation-gemini.md  # Find style transfer

veo.md (69K):

bash
grep -n "## " references/veo.md  # List sections
grep -n "audio" references/veo.md  # Find audio generation info

models-overview.md (67K):

bash
grep -n "gemini-3" references/models-overview.md
grep -n "context" references/models-overview.md

function-calling.md (54K):

bash
grep -n "## " references/function-calling.md
grep -n "parallel" references/function-calling.md  # Parallel function calls

Common Patterns

Multimodal Input (Image + Text)

python
from google import genai
from google.genai import types

client = genai.Client()
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        types.Part.from_image(image_path),
        types.Part.from_text("Describe this image")
    ]
)

Function Calling

python
tools = [
    types.Tool(function_declarations=[{
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"]
        }
    }])
]

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What's the weather in Paris?",
    config=types.GenerateContentConfig(tools=tools)
)

Google Search Grounding

python
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What are the latest AI developments?",
    config=types.GenerateContentConfig(
        tools=[types.Tool(google_search=types.GoogleSearch())]
    )
)

Thinking Mode

python
response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="Solve this complex problem...",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_budget_tokens=10000)
    )
)

Streaming

python
for chunk in client.models.generate_content_stream(
    model="gemini-2.5-flash",
    contents="Write a story"
):
    print(chunk.text, end="")

Key Concepts

Tool Execution Flow

Built-in tools (Google Search, Code Execution): Executed by Google

  1. Send prompt with tool config → Model executes tool → Response with grounded results

Custom tools (Function Calling): You execute

  1. Send prompt with function declarations → Model returns function call JSON
  2. You execute function, send result back → Model generates final response

Thought Signatures (Important)

  • If using official SDKs with chat feature: Thought signatures are handled automatically. No action needed.
  • If manually managing conversation history: Read thought-signatures.md for Gemini 3 Pro function calling requirements.

API Endpoints

Endpoint Purpose
/v1beta/models/{model}:generateContent Standard generation
/v1beta/models/{model}:streamGenerateContent Streaming
/v1beta/models/{model}:embedContent Embeddings
/v1beta/models/{model}:countTokens Token counting

Base URL: https://generativelanguage.googleapis.com

Expand your agent's capabilities with these related and highly-rated skills.

Didn't find tool you were looking for?

Be as detailed as possible for better results