Agent skill
ollama-client
Phi-4 LLM interaction skill for generating text completions via Ollama API. Use for all LLM inference tasks including section detection, summarization, recommendation generation, and quality evaluation.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/ollama-client
SKILL.md
Ollama Client Skill
Overview
This skill provides a Python wrapper for interacting with Ollama's REST API to generate text completions using the Phi-4 model (14B parameters, 16K context window). It handles timeouts, retries, and structured logging for all LLM operations.
When to Use
Use this skill when you need to:
- Generate text completions from Phi-4
- Run prompts for clinical analysis tasks
- Generate JSON-structured outputs from LLM
- Handle LLM inference with timeout protection
Installation
IMPORTANT: This skill has its own isolated virtual environment (.venv) managed by uv. Do NOT use system Python.
Initialize the skill's environment:
# From the skill directory
cd .agent/skills/ollama-client
uv sync # Creates .venv and installs dependencies from pyproject.toml
Dependencies are in pyproject.toml:
requests- HTTP client for Ollama API
Usage
CRITICAL: Always use uv run to execute code with this skill's .venv, NOT system Python.
Basic Text Generation
# From .agent/skills/ollama-client/ directory
# Run with: uv run python -c "..."
from ollama_client import OllamaClient
# Initialize client
client = OllamaClient(
host="http://localhost:11434", # Default from OLLAMA_HOST env var
model="phi4:14b", # Default from OLLAMA_MODEL env var
timeout=300 # 5 minutes default
)
# Generate completion
result = client.generate(
prompt="Summarize the following clinical note: ...",
temperature=0.1, # Low temperature for deterministic outputs
max_tokens=1000, # Optional token limit
stop_sequences=["END"] # Optional stop sequences
)
print(result["response"])
print(f"Execution time: {result['execution_time_ms']}ms")
With Environment Variables
import os
# Set in .env or docker-compose.yml
os.environ['OLLAMA_HOST'] = 'http://localhost:11434'
os.environ['OLLAMA_MODEL'] = 'phi4:14b'
# Client uses env vars automatically
client = OllamaClient()
Using from Another Module
When importing this skill from agents or other code:
import sys
from pathlib import Path
# Add skill to path (use relative path from your location)
skill_path = Path(__file__).parent.parent.parent / ".agent/skills/ollama-client"
sys.path.insert(0, str(skill_path))
from ollama_client import OllamaClient
client = OllamaClient()
Health Check
# Check if Ollama server is accessible
if client.is_available():
print("Ollama server is healthy")
else:
print("Ollama server unavailable")
Configuration
Environment Variables:
OLLAMA_HOST: Server URL (default:http://localhost:11434)OLLAMA_MODEL: Model name (default:phi4:14b)
Parameters:
temperature: Sampling temperature (0.0-1.0, default: 0.1 for deterministic outputs)max_tokens: Maximum tokens to generate (optional)stop_sequences: List of strings to stop generation (optional)timeout: Request timeout in seconds (default: 300)
Error Handling
The skill raises exceptions for:
- Timeout: If request exceeds timeout duration
- Connection Error: If Ollama server is unreachable
- API Error: If Ollama returns an error response
All errors include execution time for debugging.
Best Practices
- Low Temperature: Use
temperature=0.1for clinical tasks requiring consistency - Timeouts: Set appropriate timeouts based on prompt complexity (simple: 60s, complex: 300s)
- Health Checks: Verify server availability before critical operations
- Error Logging: Always log errors with execution time for troubleshooting
Integration with Agents
Agents use this skill for all LLM operations:
- ToC Subagent: Section topic segmentation
- Summary Subagent: Clinical entity extraction
- Recommendation Subagent: Treatment plan generation
- Evaluator Agent: Quality validation reasoning
Implementation
See ollama_client.py for the full Python implementation.
Didn't find tool you were looking for?