Agent skill

openai

OpenAI compatibility layer for Ollama. Use the official OpenAI Python library to interact with Ollama, enabling easy migration from OpenAI and compatibility with LangChain, LlamaIndex, and other OpenAI-based tools.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/openai

SKILL.md

Ollama OpenAI Compatibility

Overview

Ollama provides an OpenAI-compatible API at /v1/* endpoints. This allows using the official openai Python library with Ollama, enabling:

  • Migration - Drop-in replacement for OpenAI API
  • Tool ecosystem - Works with LangChain, LlamaIndex, etc.
  • Familiar interface - Standard OpenAI patterns

Quick Reference

Endpoint Method Purpose
/v1/models GET List models
/v1/completions POST Text generation
/v1/chat/completions POST Chat completion
/v1/embeddings POST Generate embeddings

Limitations

The OpenAI compatibility layer does not support:

  • Show model details (/api/show)
  • List running models (/api/ps)
  • Copy model (/api/copy)
  • Delete model (/api/delete)

Use bazzite-ai-jupyter:chat or bazzite-ai-jupyter:ollama for these operations.

Setup

python
import os
from openai import OpenAI

OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://localhost:11434")

client = OpenAI(
    base_url=f"{OLLAMA_HOST}/v1",
    api_key="ollama"  # Required by library but ignored by Ollama
)

List Models

python
models = client.models.list()

for model in models.data:
    print(f"  - {model.id}")

Text Completions

python
response = client.completions.create(
    model="llama3.2:latest",
    prompt="Why is the sky blue? Answer in one sentence.",
    max_tokens=100
)

print(response.choices[0].text)
print(f"Tokens used: {response.usage.completion_tokens}")

Chat Completion

Single Turn

python
response = client.chat.completions.create(
    model="llama3.2:latest",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain machine learning in one sentence."}
    ],
    temperature=0.7,
    max_tokens=100
)

print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")

Multi-Turn Conversation

python
messages = [
    {"role": "system", "content": "You are a helpful math tutor."}
]

# Turn 1
messages.append({"role": "user", "content": "What is 2 + 2?"})
response = client.chat.completions.create(
    model="llama3.2:latest",
    messages=messages,
    max_tokens=50
)
assistant_msg = response.choices[0].message.content
messages.append({"role": "assistant", "content": assistant_msg})
print(f"User: What is 2 + 2?")
print(f"Assistant: {assistant_msg}")

# Turn 2
messages.append({"role": "user", "content": "And what is that multiplied by 3?"})
response = client.chat.completions.create(
    model="llama3.2:latest",
    messages=messages,
    max_tokens=50
)
print(f"User: And what is that multiplied by 3?")
print(f"Assistant: {response.choices[0].message.content}")

Streaming

python
stream = client.chat.completions.create(
    model="llama3.2:latest",
    messages=[{"role": "user", "content": "Count from 1 to 5."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Generate Embeddings

python
response = client.embeddings.create(
    model="llama3.2:latest",
    input="Ollama makes running LLMs locally easy."
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

Error Handling

python
try:
    response = client.chat.completions.create(
        model="invalid-model",
        messages=[{"role": "user", "content": "Hello"}]
    )
except Exception as e:
    print(f"Error: {type(e).__name__}")

Migration from OpenAI

Before (OpenAI)

python
from openai import OpenAI

client = OpenAI()  # Uses OPENAI_API_KEY

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

After (Ollama)

python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

response = client.chat.completions.create(
    model="llama3.2:latest",  # Change model name
    messages=[{"role": "user", "content": "Hello!"}]
)

LangChain Integration

python
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama",
    model="llama3.2:latest"
)

response = llm.invoke("What is Python?")
print(response.content)

LlamaIndex Integration

python
from llama_index.llms.openai import OpenAI

llm = OpenAI(
    api_base="http://localhost:11434/v1",
    api_key="ollama",
    model="llama3.2:latest"
)

response = llm.complete("What is Python?")
print(response.text)

Connection Health Check

python
import requests

def check_ollama_health(model="llama3.2:latest"):
    """Check if Ollama server is running and model is available."""
    OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://localhost:11434")
    try:
        response = requests.get(f"{OLLAMA_HOST}/api/tags", timeout=5)
        if response.status_code == 200:
            models = response.json()
            model_names = [m.get("name", "") for m in models.get("models", [])]
            return True, model in model_names
        return False, False
    except requests.exceptions.RequestException:
        return False, False

server_ok, model_ok = check_ollama_health()

When to Use This Skill

Use when:

  • Migrating from OpenAI to local LLMs
  • Using LangChain, LlamaIndex, or other OpenAI-based tools
  • You prefer the OpenAI client interface
  • Building applications that may switch between OpenAI and Ollama

Cross-References

  • bazzite-ai-jupyter:ollama - Native Ollama library (more features)
  • bazzite-ai-jupyter:chat - Direct REST API access

Didn't find tool you were looking for?

Be as detailed as possible for better results