Agent skill
sentence-transformers
Framework for state-of-the-art sentence, text, and image embeddings. Provides 5000+ pre-trained models for semantic similarity, clustering, and retrieval. Supports multilingual, domain-specific, and multimodal models. Use for generating embeddings for RAG, semantic search, or similarity tasks. Best for production embedding generation.
Install this agent skill to your Project
npx add-skill https://github.com/Orchestra-Research/AI-Research-SKILLs/tree/main/15-rag/sentence-transformers
SKILL.md
Sentence Transformers - State-of-the-Art Embeddings
Python framework for sentence and text embeddings using transformers.
When to use Sentence Transformers
Use when:
- Need high-quality embeddings for RAG
- Semantic similarity and search
- Text clustering and classification
- Multilingual embeddings (100+ languages)
- Running embeddings locally (no API)
- Cost-effective alternative to OpenAI embeddings
Metrics:
- 15,700+ GitHub stars
- 5000+ pre-trained models
- 100+ languages supported
- Based on PyTorch/Transformers
Use alternatives instead:
- OpenAI Embeddings: Need API-based, highest quality
- Instructor: Task-specific instructions
- Cohere Embed: Managed service
Quick start
Installation
pip install sentence-transformers
Basic usage
from sentence_transformers import SentenceTransformer
# Load model
model = SentenceTransformer('all-MiniLM-L6-v2')
# Generate embeddings
sentences = [
"This is an example sentence",
"Each sentence is converted to a vector"
]
embeddings = model.encode(sentences)
print(embeddings.shape) # (2, 384)
# Cosine similarity
from sentence_transformers.util import cos_sim
similarity = cos_sim(embeddings[0], embeddings[1])
print(f"Similarity: {similarity.item():.4f}")
Popular models
General purpose
# Fast, good quality (384 dim)
model = SentenceTransformer('all-MiniLM-L6-v2')
# Better quality (768 dim)
model = SentenceTransformer('all-mpnet-base-v2')
# Best quality (1024 dim, slower)
model = SentenceTransformer('all-roberta-large-v1')
Multilingual
# 50+ languages
model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')
# 100+ languages
model = SentenceTransformer('paraphrase-multilingual-mpnet-base-v2')
Domain-specific
# Legal domain
model = SentenceTransformer('nlpaueb/legal-bert-base-uncased')
# Scientific papers
model = SentenceTransformer('allenai/specter')
# Code
model = SentenceTransformer('microsoft/codebert-base')
Semantic search
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('all-MiniLM-L6-v2')
# Corpus
corpus = [
"Python is a programming language",
"Machine learning uses algorithms",
"Neural networks are powerful"
]
# Encode corpus
corpus_embeddings = model.encode(corpus, convert_to_tensor=True)
# Query
query = "What is Python?"
query_embedding = model.encode(query, convert_to_tensor=True)
# Find most similar
hits = util.semantic_search(query_embedding, corpus_embeddings, top_k=3)
print(hits)
Similarity computation
# Cosine similarity
similarity = util.cos_sim(embedding1, embedding2)
# Dot product
similarity = util.dot_score(embedding1, embedding2)
# Pairwise cosine similarity
similarities = util.cos_sim(embeddings, embeddings)
Batch encoding
# Efficient batch processing
sentences = ["sentence 1", "sentence 2", ...] * 1000
embeddings = model.encode(
sentences,
batch_size=32,
show_progress_bar=True,
convert_to_tensor=False # or True for PyTorch tensors
)
Fine-tuning
from sentence_transformers import InputExample, losses
from torch.utils.data import DataLoader
# Training data
train_examples = [
InputExample(texts=['sentence 1', 'sentence 2'], label=0.8),
InputExample(texts=['sentence 3', 'sentence 4'], label=0.3),
]
train_dataloader = DataLoader(train_examples, batch_size=16)
# Loss function
train_loss = losses.CosineSimilarityLoss(model)
# Train
model.fit(
train_objectives=[(train_dataloader, train_loss)],
epochs=10,
warmup_steps=100
)
# Save
model.save('my-finetuned-model')
LangChain integration
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-mpnet-base-v2"
)
# Use with vector stores
from langchain_chroma import Chroma
vectorstore = Chroma.from_documents(
documents=docs,
embedding=embeddings
)
LlamaIndex integration
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(
model_name="sentence-transformers/all-mpnet-base-v2"
)
from llama_index.core import Settings
Settings.embed_model = embed_model
# Use in index
index = VectorStoreIndex.from_documents(documents)
Model selection guide
| Model | Dimensions | Speed | Quality | Use Case |
|---|---|---|---|---|
| all-MiniLM-L6-v2 | 384 | Fast | Good | General, prototyping |
| all-mpnet-base-v2 | 768 | Medium | Better | Production RAG |
| all-roberta-large-v1 | 1024 | Slow | Best | High accuracy needed |
| paraphrase-multilingual | 768 | Medium | Good | Multilingual |
Best practices
- Start with all-MiniLM-L6-v2 - Good baseline
- Normalize embeddings - Better for cosine similarity
- Use GPU if available - 10× faster encoding
- Batch encoding - More efficient
- Cache embeddings - Expensive to recompute
- Fine-tune for domain - Improves quality
- Test different models - Quality varies by task
- Monitor memory - Large models need more RAM
Performance
| Model | Speed (sentences/sec) | Memory | Dimension |
|---|---|---|---|
| MiniLM | ~2000 | 120MB | 384 |
| MPNet | ~600 | 420MB | 768 |
| RoBERTa | ~300 | 1.3GB | 1024 |
Resources
- GitHub: https://github.com/UKPLab/sentence-transformers ⭐ 15,700+
- Models: https://huggingface.co/sentence-transformers
- Docs: https://www.sbert.net
- License: Apache 2.0
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
peft-fine-tuning
Parameter-efficient fine-tuning for LLMs using LoRA, QLoRA, and 25+ methods. Use when fine-tuning large models (7B-70B) with limited GPU memory, when you need to train <1% of parameters with minimal accuracy loss, or for multi-adapter serving. HuggingFace's official library integrated with transformers ecosystem.
llama-factory
Expert guidance for fine-tuning LLMs with LLaMA-Factory - WebUI no-code, 100+ models, 2/3/4/5/6/8-bit QLoRA, multimodal support
unsloth
Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization
axolotl
Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support
moe-training
Train Mixture of Experts (MoE) models using DeepSpeed or HuggingFace. Use when training large-scale models with limited compute (5× cost reduction vs dense models), implementing sparse architectures like Mixtral 8x7B or DeepSeek-V3, or scaling model capacity without proportional compute increase. Covers MoE architectures, routing mechanisms, load balancing, expert parallelism, and inference optimization.
model-pruning
Reduce LLM size and accelerate inference using pruning techniques like Wanda and SparseGPT. Use when compressing models without retraining, achieving 50% sparsity with minimal accuracy loss, or enabling faster inference on hardware accelerators. Covers unstructured pruning, structured pruning, N:M sparsity, magnitude pruning, and one-shot methods.
Didn't find tool you were looking for?