Agent skill
ollama
Ollama API Documentation
Install this agent skill to your Project
npx add-skill https://github.com/rawveg/skillsforge-marketplace/tree/main/ollama
SKILL.md
Ollama Skill
Comprehensive assistance with Ollama development - the local AI model runtime for running and interacting with large language models programmatically.
When to Use This Skill
This skill should be triggered when:
- Running local AI models with Ollama
- Building applications that interact with Ollama's API
- Implementing chat completions, embeddings, or streaming responses
- Setting up Ollama authentication or cloud models
- Configuring Ollama server (environment variables, ports, proxies)
- Using Ollama with OpenAI-compatible libraries
- Troubleshooting Ollama installations or GPU compatibility
- Implementing tool calling, structured outputs, or vision capabilities
- Working with Ollama in Docker or behind proxies
- Creating, copying, pushing, or managing Ollama models
Quick Reference
1. Basic Chat Completion (cURL)
Generate a simple chat response:
curl http://localhost:11434/api/chat -d '{
"model": "gemma3",
"messages": [
{
"role": "user",
"content": "Why is the sky blue?"
}
]
}'
2. Simple Text Generation (cURL)
Generate a text response from a prompt:
curl http://localhost:11434/api/generate -d '{
"model": "gemma3",
"prompt": "Why is the sky blue?"
}'
3. Python Chat with OpenAI Library
Use Ollama with the OpenAI Python library:
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1/',
api_key='ollama', # required but ignored
)
chat_completion = client.chat.completions.create(
messages=[
{
'role': 'user',
'content': 'Say this is a test',
}
],
model='llama3.2',
)
4. Vision Model (Image Analysis)
Ask questions about images:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1/", api_key="ollama")
response = client.chat.completions.create(
model="llava",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": "data:image/png;base64,iVBORw0KG...",
},
],
}
],
max_tokens=300,
)
5. Generate Embeddings
Create vector embeddings for text:
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
embeddings = client.embeddings.create(
model="all-minilm",
input=["why is the sky blue?", "why is the grass green?"],
)
6. Structured Outputs (JSON Schema)
Get structured JSON responses:
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
class FriendInfo(BaseModel):
name: str
age: int
is_available: bool
class FriendList(BaseModel):
friends: list[FriendInfo]
completion = client.beta.chat.completions.parse(
temperature=0,
model="llama3.1:8b",
messages=[
{"role": "user", "content": "Return a list of friends in JSON format"}
],
response_format=FriendList,
)
friends_response = completion.choices[0].message
if friends_response.parsed:
print(friends_response.parsed)
7. JavaScript/TypeScript Chat
Use Ollama with the OpenAI JavaScript library:
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://localhost:11434/v1/",
apiKey: "ollama", // required but ignored
});
const chatCompletion = await openai.chat.completions.create({
messages: [{ role: "user", content: "Say this is a test" }],
model: "llama3.2",
});
8. Authentication for Cloud Models
Sign in to use cloud models:
# Sign in from CLI
ollama signin
# Then use cloud models
ollama run gpt-oss:120b-cloud
Or use API keys for direct cloud access:
export OLLAMA_API_KEY=your_api_key
curl https://ollama.com/api/generate \
-H "Authorization: Bearer $OLLAMA_API_KEY" \
-d '{
"model": "gpt-oss:120b",
"prompt": "Why is the sky blue?",
"stream": false
}'
9. Configure Ollama Server
Set environment variables for server configuration:
macOS:
# Set environment variable
launchctl setenv OLLAMA_HOST "0.0.0.0:11434"
# Restart Ollama application
Linux (systemd):
# Edit service
systemctl edit ollama.service
# Add under [Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
# Reload and restart
systemctl daemon-reload
systemctl restart ollama
Windows:
1. Quit Ollama from task bar
2. Search "environment variables" in Settings
3. Edit or create OLLAMA_HOST variable
4. Set value: 0.0.0.0:11434
5. Restart Ollama from Start menu
10. Check Model GPU Loading
Verify if your model is using GPU:
ollama ps
Output shows:
100% GPU- Fully loaded on GPU100% CPU- Fully loaded in system memory48%/52% CPU/GPU- Split between both
Key Concepts
Base URLs
- Local API (default):
http://localhost:11434/api - Cloud API:
https://ollama.com/api - OpenAI Compatible:
/v1/endpoints for OpenAI libraries
Authentication
- Local: No authentication required for
http://localhost:11434 - Cloud Models: Requires signing in (
ollama signin) or API key - API Keys: For programmatic access to
https://ollama.com/api
Models
- Local Models: Run on your machine (e.g.,
gemma3,llama3.2,qwen3) - Cloud Models: Suffix
-cloud(e.g.,gpt-oss:120b-cloud,qwen3-coder:480b-cloud) - Vision Models: Support image inputs (e.g.,
llava)
Common Environment Variables
OLLAMA_HOST- Change bind address (default:127.0.0.1:11434)OLLAMA_CONTEXT_LENGTH- Context window size (default:2048tokens)OLLAMA_MODELS- Model storage directoryOLLAMA_ORIGINS- Allow additional web origins for CORSHTTPS_PROXY- Proxy server for model downloads
Error Handling
Status Codes:
200- Success400- Bad Request (invalid parameters)404- Not Found (model doesn't exist)429- Too Many Requests (rate limit)500- Internal Server Error502- Bad Gateway (cloud model unreachable)
Error Format:
{
"error": "the model failed to generate a response"
}
Streaming vs Non-Streaming
- Streaming (default): Returns response chunks as JSON objects (NDJSON)
- Non-Streaming: Set
"stream": falseto get complete response in one object
Reference Files
This skill includes comprehensive documentation in references/:
-
llms-txt.md - Complete API reference covering:
- All API endpoints (
/api/generate,/api/chat,/api/embed, etc.) - Authentication methods (signin, API keys)
- Error handling and status codes
- OpenAI compatibility layer
- Cloud models usage
- Streaming responses
- Configuration and environment variables
- All API endpoints (
-
llms.md - Documentation index listing all available topics:
- API reference (version, model details, chat, generate, embeddings)
- Capabilities (embeddings, streaming, structured outputs, tool calling, vision)
- CLI reference
- Cloud integration
- Platform-specific guides (Linux, macOS, Windows, Docker)
- IDE integrations (VS Code, JetBrains, Xcode, Zed, Cline)
Use the reference files when you need:
- Detailed API parameter specifications
- Complete endpoint documentation
- Advanced configuration options
- Platform-specific setup instructions
- Integration guides for specific tools
Working with This Skill
For Beginners
Start with these common patterns:
- Simple generation: Use
/api/generateendpoint with a prompt - Chat interface: Use
/api/chatwith messages array - OpenAI compatibility: Use OpenAI libraries with
base_url='http://localhost:11434/v1/' - Check GPU usage: Run
ollama psto verify model loading
Read llms-txt.md section on "Introduction" and "Quickstart" for foundational concepts.
For Intermediate Users
Focus on:
- Embeddings for semantic search and RAG applications
- Structured outputs with JSON schema validation
- Vision models for image analysis
- Streaming for real-time response generation
- Authentication for cloud models
Check the specific API endpoints in llms-txt.md for detailed parameter options.
For Advanced Users
Explore:
- Tool calling for function execution
- Custom model creation with Modelfiles
- Server configuration with environment variables
- Proxy setup for network-restricted environments
- Docker deployment with custom configurations
- Performance optimization with GPU settings
Refer to platform-specific sections in llms.md and configuration details in llms-txt.md.
Common Use Cases
Building a chatbot:
- Use
/api/chatendpoint - Maintain message history in your application
- Stream responses for better UX
- Handle errors gracefully
Creating embeddings for search:
- Use
/api/embedendpoint - Store embeddings in vector database
- Perform similarity search
- Implement RAG (Retrieval Augmented Generation)
Running behind a firewall:
- Set
HTTPS_PROXYenvironment variable - Configure proxy in Docker if containerized
- Ensure certificates are trusted
Using cloud models:
- Run
ollama signinonce - Pull cloud models with
-cloudsuffix - Use same API endpoints as local models
Troubleshooting
Model Not Loading on GPU
Check:
ollama ps
Solutions:
- Verify GPU compatibility in documentation
- Check CUDA/ROCm installation
- Review available VRAM
- Try smaller model variants
Cannot Access Ollama Remotely
Problem: Ollama only accessible from localhost
Solution:
# Set OLLAMA_HOST to bind to all interfaces
export OLLAMA_HOST="0.0.0.0:11434"
See "How do I configure Ollama server?" in llms-txt.md for platform-specific instructions.
Proxy Issues
Problem: Cannot download models behind proxy
Solution:
# Set proxy (HTTPS only, not HTTP)
export HTTPS_PROXY=https://proxy.example.com
# Restart Ollama
See "How do I use Ollama behind a proxy?" in llms-txt.md.
CORS Errors in Browser
Problem: Browser extension or web app cannot access Ollama
Solution:
# Allow specific origins
export OLLAMA_ORIGINS="chrome-extension://*,moz-extension://*"
See "How can I allow additional web origins?" in llms-txt.md.
Resources
Official Documentation
- Main docs: https://docs.ollama.com
- API Reference: https://docs.ollama.com/api
- Model Library: https://ollama.com/models
Official Libraries
- Python: https://github.com/ollama/ollama-python
- JavaScript: https://github.com/ollama/ollama-js
Community
- GitHub: https://github.com/ollama/ollama
- Community Libraries: See GitHub README for full list
Notes
- This skill was generated from official Ollama documentation
- All examples are tested and working with Ollama's API
- Code samples include proper language detection for syntax highlighting
- Reference files preserve structure from official docs with working links
- OpenAI compatibility means most OpenAI code works with minimal changes
Quick Command Reference
# CLI Commands
ollama signin # Sign in to ollama.com
ollama run gemma3 # Run a model interactively
ollama pull gemma3 # Download a model
ollama ps # List running models
ollama list # List installed models
# Check API Status
curl http://localhost:11434/api/version
# Environment Variables (Common)
export OLLAMA_HOST="0.0.0.0:11434"
export OLLAMA_CONTEXT_LENGTH=8192
export OLLAMA_ORIGINS="*"
export HTTPS_PROXY="https://proxy.example.com"
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
word-count-checker
Automatically checks word counts of documents when the user mentions word count in relation to a file. Triggers on phrases like "Check the word count of X", "Stop when the word count is N", or similar references to document word counts. Use this skill proactively whenever word count is mentioned with a document reference.
replicate-cli
This skill provides comprehensive guidance for using the Replicate CLI to run AI models, create predictions, manage deployments, and fine-tune models. Use this skill when the user wants to interact with Replicate's AI model platform via command line, including running image generation models, language models, or any ML model hosted on Replicate. This skill should be used when users ask about running models on Replicate, creating predictions, managing deployments, fine-tuning models, or working with the Replicate API through the CLI.
haveibeenpwned
HaveIBeenPwned API Documentation - Check if email accounts or passwords have been compromised in data breaches
laravel-dusk
Laravel Dusk - Browser automation and testing API for Laravel applications. Use when writing browser tests, automating UI testing, testing JavaScript interactions, or implementing end-to-end tests in Laravel.
threads-api
Threads API Documentation
ds-continuity
Death & Sourdough series continuity checker. MANDATORY before writing or editing ANY prose chapter for the Death & Sourdough project. Ensures cross-referencing of established facts (character details, locations, timeline, objects, quoted text, relationship dynamics) against the Continuity Bible, and updates the bible after writing. Trigger whenever: (1) writing a new chapter, (2) revising or fleshing out an existing chapter, (3) adding new characters, locations, or named details to the prose.
Didn't find tool you were looking for?