Agent skill

embedding

Standalone embedding service for semantic search. Runs as persistent FastAPI server for millisecond-latency embeddings. Supports model swapping via env vars. Use when you need vectors for any database (ArangoDB, Pinecone, etc).

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/embedding

Metadata

Additional technical details for this skill

short description
Persistent embedding service for semantic search

SKILL.md

Embedding Skill

Standalone embedding service for semantic search across any database.

Architecture

┌─────────────────────────────────────────┐
│         embedding service (:8602)       │
│  Model: EMBEDDING_MODEL env var         │
│  Device: auto (CPU/GPU)                 │
└───────────────────┬─────────────────────┘
                    │
    ┌───────────────┼───────────────┐
    ▼               ▼               ▼
 memory        edge-verifier    your-project
 skill         searches         ArangoDB/etc

Quick Start

bash
# Start the service (first run loads model ~5-10s)
./run.sh serve

# Embed text (CLI)
./run.sh embed --text "your query here"

# Embed via HTTP (after service is running)
curl -X POST http://127.0.0.1:8602/embed -H "Content-Type: application/json" \
  -d '{"text": "your query here"}'

Commands

Command Description
./run.sh serve Start persistent FastAPI server
./run.sh embed --text "..." Embed single text (uses service if running)
./run.sh embed --file input.txt Embed file contents
./run.sh info Show model, device, service status

Configuration

Variable Default Description
EMBEDDING_MODEL all-MiniLM-L6-v2 Sentence-transformers model name
EMBEDDING_DEVICE auto Device: auto, cpu, cuda, mps
EMBEDDING_PORT 8602 Service port
EMBEDDING_SERVICE_URL http://127.0.0.1:8602 Client connection URL

Swapping Models

bash
# Use a different model for this project
export EMBEDDING_MODEL="nomic-ai/nomic-embed-text-v1"
./run.sh serve

# Or for GPU-accelerated
export EMBEDDING_MODEL="intfloat/e5-large-v2"
export EMBEDDING_DEVICE="cuda"
./run.sh serve

API Endpoints

POST /embed

Embed single text.

json
{"text": "query to embed"}
→ {"vector": [0.1, 0.2, ...], "model": "all-MiniLM-L6-v2", "dimensions": 384}

POST /embed/batch

Embed multiple texts.

json
{"texts": ["query 1", "query 2"]}
→ {"vectors": [[...], [...]], "model": "...", "count": 2}

GET /info

Service status and configuration.

json
{
  "model": "all-MiniLM-L6-v2",
  "device": "cuda",
  "dimensions": 384,
  "status": "ready"
}

Integration Examples

ArangoDB Semantic Search

python
import httpx

# Get embedding
resp = httpx.post("http://127.0.0.1:8602/embed", json={"text": "find similar docs"})
vector = resp.json()["vector"]

# Use in AQL query
aql = """
FOR doc IN my_collection
  LET score = COSINE_SIMILARITY(doc.embedding, @vector)
  FILTER score > 0.7
  SORT score DESC
  RETURN doc
"""

From Memory Skill

Memory skill can consume this service by setting:

bash
export EMBEDDING_SERVICE_URL="http://127.0.0.1:8602"

Cold Start

First invocation loads the model (~5-10 seconds). After that, embeddings are millisecond-latency. The service logs progress:

[embedding] Loading model: all-MiniLM-L6-v2...
[embedding] Model loaded in 6.2s
[embedding] Service ready on http://127.0.0.1:8602

Expand your agent's capabilities with these related and highly-rated skills.

Didn't find tool you were looking for?

Be as detailed as possible for better results