Agent skills
gemini-api-guides

Agent skill

gemini-api-guides

Comprehensive reference for Google's Gemini API. Use when building applications with: (1) Gemini models (Gemini 3 Pro, 2.5 Flash/Pro/Flash-Lite) for text and multimodal generation, (2) Image generation (Imagen, Nano Banana), video (Veo 3.1), music (Lyria), (3) Function calling, structured outputs, and agentic workflows, (4) Built-in tools: Google Search, Maps, Code Execution, URL Context, Computer Use, File Search, (5) Live API for real-time voice/video streaming, (6) Long context (1M+ tokens), embeddings, document/audio/video understanding, (7) Batch API, context caching, safety settings. Triggers: "gemini api", "google ai", "genai sdk", "gemini model", "veo", "imagen", "nano banana", "lyria", "live api", "vertex ai"

View SKILL.md on GitHub Repository

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/gemini-api-guides

SKILL.md

Gemini API Skill

Build AI applications with Google's Gemini models and tools.

Quick Start

Installation

bash

# Python
pip install google-genai

# JavaScript/Node.js
npm install @google/genai

# Go
go get google.golang.org/genai

Environment Setup

bash

export GEMINI_API_KEY="your-api-key"

Basic Usage

Python:

python

from google import genai

client = genai.Client()
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Your prompt here"
)
print(response.text)

JavaScript:

javascript

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({});
const response = await ai.models.generateContent({
    model: "gemini-2.5-flash",
    contents: "Your prompt here"
});
console.log(response.text);

REST:

bash

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{"contents": [{"parts": [{"text": "Your prompt here"}]}]}'

Model Selection

Model	Best For	Context Window
Gemini 3 Pro	Most intelligent tasks, multimodal reasoning, agentic	See models-overview
Gemini 2.5 Pro	Complex reasoning, coding, extended thinking	1M tokens
Gemini 2.5 Flash	Balanced performance, general tasks	1M tokens
Gemini 2.5 Flash-Lite	High-volume, cost-sensitive, fastest	See models-overview
Imagen	High-fidelity image generation	N/A
Veo 3.1	Video generation (8s, 720p/1080p with audio)	N/A
Nano Banana	Native image gen with Gemini 2.5 Flash	N/A
Nano Banana Pro	Native image gen with Gemini 3 Pro	N/A

Reference Documentation Index

Getting Started

Topic	File	Description
Setup & Libraries	getting-started.md	API keys, SDK installation, OpenAI compatibility

Models & Pricing

Topic	File	Description
Model Overview	models-overview.md	All models, capabilities, context windows
Pricing	api-pricing.md	Token costs, tool pricing
Rate Limits	rate-limits.md	RPM/TPM limits, quotas
Gemini 3 Guide	gemini-3.md	Gemini 3 specific features and best practices
Imagen	imagen.md	Image generation with Imagen model
Embeddings	embeddings.md	Text embeddings for search/RAG
Veo	veo.md	Video generation with Veo 3.1 (69K)
Lyria	lyria.md	Music generation with Lyria RealTime
Robotics	robotics.md	Gemini Robotics-ER 1.5 (42K)

Core Capabilities

Topic	File	Description
Text Generation	text-generation.md	Text generation, system instructions (38K)
Image Gen (Nano Banana)	image-generation-gemini.md	Native image generation with Gemini (LARGE: 174K)
Image Understanding	image-understanding.md	Vision, image analysis
Video Understanding	video-understanding.md	Video analysis, timestamps
Document Understanding	document-understanding.md	PDF and document processing
Speech Generation	speech-generation.md	Text-to-speech (TTS)
Audio Understanding	audio-understanding.md	Audio analysis, transcription

Advanced Features

Topic	File	Description
Thinking Mode	thinking.md	Extended reasoning capabilities
Thought Signatures	thought-signatures.md	EDGE CASE ONLY: Manual signature handling when NOT using official SDKs
Structured Outputs	structured-outputs.md	JSON schema responses
Function Calling	function-calling.md	Custom tool integration (54K)
Long Context	long-context.md	1M+ token handling, context caching

Tools

Topic	File	Description
Tools Overview	tools-overview.md	Built-in tools summary, agent frameworks
Google Search	google-search.md	Web search grounding
Google Maps	google-maps.md	Location-aware grounding
Code Execution	code-execution.md	Python code execution tool
URL Context	url-context.md	URL content extraction
Computer Use	computer-use.md	Browser automation (preview) (44K)
File Search	file-search.md	RAG with document indexing

Live API (Real-time Streaming)

Topic	File	Description
Getting Started	live-api-getting-started.md	Low-latency voice/video interactions
Capabilities Guide	live-api-capabilities.md	Full capabilities and configurations (32K)
Tool Use	live-api-tools.md	Function calling & Search in Live API
Session Management	live-api-sessions.md	Session handling, time limits
Ephemeral Tokens	ephemeral-tokens.md	Short-lived auth for client-side WebSockets

Guides

Topic	File	Description
Batch API	batch-api.md	Async processing at 50% cost (47K)
Files API	files-api.md	Upload and manage media files (49K)
Context Caching	context-caching.md	Implicit & explicit caching for cost savings
Media Resolution	media-resolution.md	Control token allocation for media
Tokens	tokens.md	Understand and count tokens
Prompt Design	prompt-design.md	Prompt strategies and best practices (47K)
Logs & Datasets	logs-datasets.md	Enable logging, view in AI Studio
Data Logging & Sharing	data-logging-sharing.md	Storage and management of API logs
Safety Settings	safety-settings.md	Adjust safety filters
Safety Guidance	safety-guidance.md	Best practices for safe AI use

Troubleshooting & Migration

Topic	File	Description
Troubleshooting	troubleshooting.md	Diagnose and resolve common API issues (25K)
Vertex AI Comparison	vertex-ai-comparison.md	READ ONLY IF USER MENTIONS "VERTEX AI": Gemini Developer API vs Vertex AI differences

Large Files - Search Patterns

For large reference files (>30K), use grep to find specific sections:

image-generation-gemini.md (174K):

bash

grep -n "## " references/image-generation-gemini.md  # List sections
grep -n "edit" references/image-generation-gemini.md  # Find editing info
grep -n "style" references/image-generation-gemini.md  # Find style transfer

veo.md (69K):

bash

grep -n "## " references/veo.md  # List sections
grep -n "audio" references/veo.md  # Find audio generation info

models-overview.md (67K):

bash

grep -n "gemini-3" references/models-overview.md
grep -n "context" references/models-overview.md

function-calling.md (54K):

bash

grep -n "## " references/function-calling.md
grep -n "parallel" references/function-calling.md  # Parallel function calls

Common Patterns

Multimodal Input (Image + Text)

python

from google import genai
from google.genai import types

client = genai.Client()
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        types.Part.from_image(image_path),
        types.Part.from_text("Describe this image")
    ]
)

Function Calling

python

tools = [
    types.Tool(function_declarations=[{
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"]
        }
    }])
]

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What's the weather in Paris?",
    config=types.GenerateContentConfig(tools=tools)
)

Google Search Grounding

python

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What are the latest AI developments?",
    config=types.GenerateContentConfig(
        tools=[types.Tool(google_search=types.GoogleSearch())]
    )
)

Thinking Mode

python

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="Solve this complex problem...",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_budget_tokens=10000)
    )
)

Streaming

python

for chunk in client.models.generate_content_stream(
    model="gemini-2.5-flash",
    contents="Write a story"
):
    print(chunk.text, end="")

Key Concepts

Tool Execution Flow

Built-in tools (Google Search, Code Execution): Executed by Google

Send prompt with tool config → Model executes tool → Response with grounded results

Custom tools (Function Calling): You execute

Send prompt with function declarations → Model returns function call JSON
You execute function, send result back → Model generates final response

Thought Signatures (Important)

If using official SDKs with chat feature: Thought signatures are handled automatically. No action needed.
If manually managing conversation history: Read thought-signatures.md for Gemini 3 Pro function calling requirements.

API Endpoints

Endpoint	Purpose
`/v1beta/models/{model}:generateContent`	Standard generation
`/v1beta/models/{model}:streamGenerateContent`	Streaming
`/v1beta/models/{model}:embedContent`	Embeddings
`/v1beta/models/{model}:countTokens`	Token counting

Base URL: https://generativelanguage.googleapis.com

Maintainer

majiayu000 Core maintainer

Source details

Full Name: majiayu000/claude-skill-registry
Branch: main
Path in repo: skills/data/gemini-api-guides
License: MIT License

Featured Tools

Join Our Newsletter

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Gemini API Skill

Quick Start

Installation

Environment Setup

Basic Usage

Model Selection

Reference Documentation Index

Getting Started

Models & Pricing

Core Capabilities

Advanced Features

Tools

Live API (Real-time Streaming)

Guides

Troubleshooting & Migration

Large Files - Search Patterns

Common Patterns

Multimodal Input (Image + Text)

Function Calling

Google Search Grounding

Thinking Mode

Streaming

Key Concepts

Tool Execution Flow

Thought Signatures (Important)

API Endpoints

Recommended Agent Skills

agent-ops-spec

agent-ops-state

agent-ops-spec

agent-ops-testing

agent-ops-testing

agent-ops-state