Agent skill

groq-inference

Fast LLM inference with Groq API - chat, vision, audio STT/TTS, tool use. Use when: groq, fast inference, low latency, whisper, PlayAI TTS, Llama, vision API, tool calling, voice agents, real-time AI.

View SKILL.md on GitHub Repository

Stars 6

Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/ScientiaCapital/skills/tree/main/active/groq-inference-skill

SKILL.md

<quick_start> Basic chat with GROQ:

python

from groq import Groq
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",  # Best all-around
    messages=[{"role": "user", "content": prompt}],
)

Model selection:

Use Case	Model
General chat	`llama-3.3-70b-versatile`
Vision/OCR	`meta-llama/llama-4-scout-17b-16e-instruct`
STT	`whisper-large-v3` (GROQ-hosted, NOT OpenAI)
TTS	`playai-tts`
</quick_start>

<success_criteria> GROQ integration is successful when:

Correct model selected for use case (see model table)
API key in environment variable (GROQ_API_KEY)
Retry logic with tenacity for rate limits
Streaming enabled for real-time applications
Async patterns used for parallel queries
NOT using OpenAI (constraint: NO OPENAI) </success_criteria>

<core_content> Ultra-fast LLM inference for real-time applications. GROQ delivers 10-100x faster inference than standard providers.

Quick Reference: Model Selection

Use Case	Model ID	Context	Notes
General Chat	`llama-3.3-70b-versatile`	128K	Best all-around
Fast Chat	`llama-3.1-8b-instant`	128K	Simple tasks, fastest
Vision/OCR	`meta-llama/llama-4-scout-17b-16e-instruct`	128K	Up to 5 images
STT	`whisper-large-v3`	448	GROQ-hosted (NOT OpenAI API)
TTS	`playai-tts`	-	Fritz-PlayAI voice
Reasoning	`meta-llama/llama-4-maverick-17b-128e-instruct`	128K	Thinking models
Tool Use	`compound-beta`	-	Built-in web search, code exec

Core Patterns

1. Chat Completion (Basic + Streaming)

python

import os
from groq import Groq, AsyncGroq

client = Groq(api_key=os.environ.get("GROQ_API_KEY"))

def chat(prompt: str, system: str = "You are helpful.") -> str:
    response = client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": prompt}
        ],
        temperature=0.7,
        max_completion_tokens=1024,
    )
    return response.choices[0].message.content

# Streaming
def stream_chat(prompt: str):
    stream = client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": prompt}],
        stream=True,
    )
    for chunk in stream:
        if chunk.choices[0].delta.content:
            yield chunk.choices[0].delta.content

2. Vision / Multimodal

python

import base64

def analyze_image(image_path: str, prompt: str) -> str:
    with open(image_path, "rb") as f:
        image_b64 = base64.standard_b64encode(f.read()).decode("utf-8")

    response = client.chat.completions.create(
        model="meta-llama/llama-4-scout-17b-16e-instruct",
        messages=[{
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_b64}"}}
            ]
        }],
    )
    return response.choices[0].message.content

# URL-based: just pass {"url": "https://..."} instead of base64

3. Audio: Speech-to-Text (GROQ-Hosted Whisper)

Note: Whisper on GROQ runs on GROQ hardware - NOT calling OpenAI's API. Whisper is an open-source model that GROQ hosts for fast inference.

python

def transcribe(audio_path: str, language: str = "en") -> str:
    with open(audio_path, "rb") as f:
        result = client.audio.transcriptions.create(
            file=f,
            model="whisper-large-v3",  # GROQ-hosted, not OpenAI API
            language=language,
            response_format="verbose_json",  # Includes timestamps
        )
    return result.text

def translate_to_english(audio_path: str) -> str:
    with open(audio_path, "rb") as f:
        result = client.audio.translations.create(file=f, model="whisper-large-v3")
    return result.text

Alternative STT Providers (if you prefer non-Whisper options):

Deepgram - Real-time streaming, lowest latency (pip install deepgram-sdk)
AssemblyAI - High accuracy, speaker diarization (pip install assemblyai)
See voice-ai-skill for Deepgram/AssemblyAI integration patterns

4. Audio: Text-to-Speech (PlayAI)

python

def text_to_speech(text: str, output_path: str = "output.wav"):
    response = client.audio.speech.create(
        model="playai-tts",
        voice="Fritz-PlayAI",  # Also: Arista-PlayAI
        input=text,
        response_format="wav",
    )
    response.write_to_file(output_path)

# Streaming TTS
def stream_tts(text: str):
    with client.audio.speech.with_streaming_response.create(
        model="playai-tts", voice="Fritz-PlayAI", input=text, response_format="wav"
    ) as response:
        for chunk in response.iter_bytes(1024):
            yield chunk

Alternative TTS Providers (beyond GROQ's PlayAI):

Cartesia - Ultra-low latency, emotional control (pip install cartesia)
ElevenLabs - Most natural voices, voice cloning (pip install elevenlabs)
Deepgram - Fast, cost-effective (pip install deepgram-sdk)
See voice-ai-skill for Cartesia/ElevenLabs/Deepgram TTS integration patterns

5. Tool Use / Function Calling

python

import json

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"]
        }
    }
}]

def chat_with_tools(prompt: str):
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model="llama-3.3-70b-versatile", messages=messages, tools=tools, tool_choice="auto"
    )
    msg = response.choices[0].message

    if msg.tool_calls:
        for tc in msg.tool_calls:
            result = execute_function(tc.function.name, json.loads(tc.function.arguments))
            messages.extend([msg, {"role": "tool", "tool_call_id": tc.id, "content": json.dumps(result)}])
        return client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools).choices[0].message.content
    return msg.content

6. Compound Beta (Built-in Web Search + Code Exec)

python

def compound_query(prompt: str):
    """Built-in tools: web_search, code_execution."""
    response = client.chat.completions.create(
        model="compound-beta",
        messages=[{"role": "user", "content": prompt}],
    )
    msg = response.choices[0].message
    # Access msg.executed_tools for tool results
    return msg.content

7. Reasoning Models

python

def reasoning_query(prompt: str, format: str = "parsed"):
    """format: 'parsed' (structured), 'raw' (visible), 'hidden' (no thinking)"""
    response = client.chat.completions.create(
        model="meta-llama/llama-4-maverick-17b-128e-instruct",
        messages=[{"role": "user", "content": prompt}],
        reasoning_format=format,
    )
    msg = response.choices[0].message
    if format == "parsed" and hasattr(msg, 'reasoning'):
        return {"thinking": msg.reasoning, "answer": msg.content}
    return msg.content

8. Async Patterns

python

async_client = AsyncGroq(api_key=os.environ.get("GROQ_API_KEY"))

async def async_chat(prompt: str) -> str:
    response = await async_client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": prompt}],
    )
    return response.choices[0].message.content

async def parallel_queries(prompts: list[str]) -> list[str]:
    import asyncio
    return await asyncio.gather(*[async_chat(p) for p in prompts])

Rate Limits

Tier	Requests/min	Tokens/min	Tokens/day
Free	30	15,000	500,000
Paid	100+	100,000+	Unlimited

python

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def reliable_chat(prompt: str) -> str:
    return chat(prompt)

Integration Notes

Pairs with: voice-ai-skill (Whisper STT + PlayAI TTS), langgraph-agents-skill
Complements: trading-signals-skill (fast analysis), data-analysis-skill
Projects: VozLux (voice agents), FieldVault-AI (document processing)
Constraint: NO OPENAI - GROQ is the fast inference layer

Environment Variables

bash

GROQ_API_KEY=gsk_...  # Required - get from console.groq.com

# Optional multi-provider
ANTHROPIC_API_KEY=    # Claude for complex reasoning
GOOGLE_API_KEY=       # Gemini fallback

Reference Files

reference/models-catalog.md - Complete model catalog with specs
reference/audio-speech.md - Whisper STT and PlayAI TTS deep dive
reference/vision-multimodal.md - Multimodal and image processing
reference/tool-use-patterns.md - Function calling and Compound Beta
reference/reasoning-models.md - Thinking models and reasoning_format
reference/cost-optimization.md - Batch API, caching, provider routing

Emit Outcome Sidecar

As the final step, write to ~/.claude/skill-analytics/last-outcome-groq-inference.json:

json

{"ts":"[UTC ISO8601]","skill":"groq-inference","version":"1.0.0","variant":"default",
 "status":"[success|partial|error]","runtime_ms":[estimated ms from start],
 "metrics":{"requests_made":[n],"tokens_used":[n],"models_used":[n]},
 "error":null,"session_id":"[YYYY-MM-DD]"}

Use status "partial" if some stages failed but results were produced. Use "error" only if no output was generated.

Maintainer

ScientiaCapital Core maintainer

Source details

Full Name: ScientiaCapital/skills
Branch: main
Path in repo: active/groq-inference-skill

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

ScientiaCapital/skills

project-context

Maintains project context and progress tracking across Claude sessions. Use at session start to load context, on session end to save progress. Triggers: load project context, save context, end session, what was I working on, switch to project, done for today.

6 0

Explore

ScientiaCapital/skills

workflow-enforcer-skill

Enforces workflow discipline across ALL projects. Ensures Claude checks for specialized agents before responding, announces skill/agent usage, and creates TodoWrite todos for multi-step tasks. Triggers: automatic on all sessions, use the right agent, follow workflow.

6 0

Explore

ScientiaCapital/skills

gtm-pricing

B2B go-to-market strategy, pricing models, ICP development, positioning, and competitive intelligence. Use when planning GTM strategy, setting pricing, defining ICP, or evaluating opportunities.

6 0

Explore

ScientiaCapital/skills

subagent-teams

Orchestrate in-session Task tool teams for parallel work. Fan-out research, implementation, review, and documentation across subagents. Use when: parallel tasks, fan-out, subagent team, Task tool, in-session agents.

6 0

Explore

ScientiaCapital/skills

planning-prompts

Comprehensive skill for project planning and prompt engineering. Covers hierarchical plans (briefs, roadmaps, phases), Claude-to-Claude meta-prompts, and multi-stage workflows. Use when: planning, prompt creation, agentic pipeline work, project roadmap, meta-prompts, research to implement workflow.

6 0

Explore

ScientiaCapital/skills

linkedin-sales-navigator-alt-skill

Build targeted prospect lists by analyzing LinkedIn profiles, extracting job titles, companies, locations, and recent activity. Identifies decision-makers, tracks job changes for warm outreach, and enriches contact data. Use when users need to find prospects, build lead lists, or track decision-maker movements.

6 0

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Quick Reference: Model Selection

Core Patterns

1. Chat Completion (Basic + Streaming)

2. Vision / Multimodal

3. Audio: Speech-to-Text (GROQ-Hosted Whisper)

4. Audio: Text-to-Speech (PlayAI)

5. Tool Use / Function Calling

6. Compound Beta (Built-in Web Search + Code Exec)

7. Reasoning Models

8. Async Patterns

Rate Limits

Integration Notes

Environment Variables

Reference Files

Emit Outcome Sidecar

Recommended Agent Skills

project-context

workflow-enforcer-skill

gtm-pricing

subagent-teams

planning-prompts

linkedin-sales-navigator-alt-skill