Agent skills
distributed-tracing

Agent skill

distributed-tracing

Implement distributed tracing with Jaeger and Tempo to track requests across microservices and identify performance bottlenecks. Use when debugging microservices, analyzing request flows, or implementing observability for distributed systems.

View SKILL.md on GitHub Repository

Stars 32,911

Forks 3,584

Install this agent skill to your Project

npx add-skill https://github.com/wshobson/agents/tree/main/plugins/observability-monitoring/skills/distributed-tracing

SKILL.md

Distributed Tracing

Implement distributed tracing with Jaeger and Tempo for request flow visibility across microservices.

Purpose

Track requests across distributed systems to understand latency, dependencies, and failure points.

When to Use

Debug latency issues
Understand service dependencies
Identify bottlenecks
Trace error propagation
Analyze request paths

Distributed Tracing Concepts

Trace Structure

Trace (Request ID: abc123)
  ↓
Span (frontend) [100ms]
  ↓
Span (api-gateway) [80ms]
  ├→ Span (auth-service) [10ms]
  └→ Span (user-service) [60ms]
      └→ Span (database) [40ms]

Key Components

Trace - End-to-end request journey
Span - Single operation within a trace
Context - Metadata propagated between services
Tags - Key-value pairs for filtering
Logs - Timestamped events within a span

Jaeger Setup

Kubernetes Deployment

bash

# Deploy Jaeger Operator
kubectl create namespace observability
kubectl create -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.51.0/jaeger-operator.yaml -n observability

# Deploy Jaeger instance
kubectl apply -f - <<EOF
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: jaeger
  namespace: observability
spec:
  strategy: production
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: http://elasticsearch:9200
  ingress:
    enabled: true
EOF

Docker Compose

yaml

version: "3.8"
services:
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "5775:5775/udp"
      - "6831:6831/udp"
      - "6832:6832/udp"
      - "5778:5778"
      - "16686:16686" # UI
      - "14268:14268" # Collector
      - "14250:14250" # gRPC
      - "9411:9411" # Zipkin
    environment:
      - COLLECTOR_ZIPKIN_HOST_PORT=:9411

Reference: See references/jaeger-setup.md

Application Instrumentation

OpenTelemetry (Recommended)

Python (Flask)

python

from opentelemetry import trace
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from flask import Flask

# Initialize tracer
resource = Resource(attributes={SERVICE_NAME: "my-service"})
provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(JaegerExporter(
    agent_host_name="jaeger",
    agent_port=6831,
))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

# Instrument Flask
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)

@app.route('/api/users')
def get_users():
    tracer = trace.get_tracer(__name__)

    with tracer.start_as_current_span("get_users") as span:
        span.set_attribute("user.count", 100)
        # Business logic
        users = fetch_users_from_db()
        return {"users": users}

def fetch_users_from_db():
    tracer = trace.get_tracer(__name__)

    with tracer.start_as_current_span("database_query") as span:
        span.set_attribute("db.system", "postgresql")
        span.set_attribute("db.statement", "SELECT * FROM users")
        # Database query
        return query_database()

Node.js (Express)

javascript

const { NodeTracerProvider } = require("@opentelemetry/sdk-trace-node");
const { JaegerExporter } = require("@opentelemetry/exporter-jaeger");
const { BatchSpanProcessor } = require("@opentelemetry/sdk-trace-base");
const { registerInstrumentations } = require("@opentelemetry/instrumentation");
const { HttpInstrumentation } = require("@opentelemetry/instrumentation-http");
const {
  ExpressInstrumentation,
} = require("@opentelemetry/instrumentation-express");

// Initialize tracer
const provider = new NodeTracerProvider({
  resource: { attributes: { "service.name": "my-service" } },
});

const exporter = new JaegerExporter({
  endpoint: "http://jaeger:14268/api/traces",
});

provider.addSpanProcessor(new BatchSpanProcessor(exporter));
provider.register();

// Instrument libraries
registerInstrumentations({
  instrumentations: [new HttpInstrumentation(), new ExpressInstrumentation()],
});

const express = require("express");
const app = express();

app.get("/api/users", async (req, res) => {
  const tracer = trace.getTracer("my-service");
  const span = tracer.startSpan("get_users");

  try {
    const users = await fetchUsers();
    span.setAttributes({ "user.count": users.length });
    res.json({ users });
  } finally {
    span.end();
  }
});

Go

package main

import (
    "context"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/jaeger"
    "go.opentelemetry.io/otel/sdk/resource"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
    semconv "go.opentelemetry.io/otel/semconv/v1.4.0"
)

func initTracer() (*sdktrace.TracerProvider, error) {
    exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(
        jaeger.WithEndpoint("http://jaeger:14268/api/traces"),
    ))
    if err != nil {
        return nil, err
    }

    tp := sdktrace.NewTracerProvider(
        sdktrace.WithBatcher(exporter),
        sdktrace.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceNameKey.String("my-service"),
        )),
    )

    otel.SetTracerProvider(tp)
    return tp, nil
}

func getUsers(ctx context.Context) ([]User, error) {
    tracer := otel.Tracer("my-service")
    ctx, span := tracer.Start(ctx, "get_users")
    defer span.End()

    span.SetAttributes(attribute.String("user.filter", "active"))

    users, err := fetchUsersFromDB(ctx)
    if err != nil {
        span.RecordError(err)
        return nil, err
    }

    span.SetAttributes(attribute.Int("user.count", len(users)))
    return users, nil
}

Reference: See references/instrumentation.md

Context Propagation

HTTP Headers

traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
tracestate: congo=t61rcWkgMzE

Propagation in HTTP Requests

Python

python

from opentelemetry.propagate import inject

headers = {}
inject(headers)  # Injects trace context

response = requests.get('http://downstream-service/api', headers=headers)

Node.js

javascript

const { propagation } = require("@opentelemetry/api");

const headers = {};
propagation.inject(context.active(), headers);

axios.get("http://downstream-service/api", { headers });

Tempo Setup (Grafana)

Kubernetes Deployment

yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: tempo-config
data:
  tempo.yaml: |
    server:
      http_listen_port: 3200

    distributor:
      receivers:
        jaeger:
          protocols:
            thrift_http:
            grpc:
        otlp:
          protocols:
            http:
            grpc:

    storage:
      trace:
        backend: s3
        s3:
          bucket: tempo-traces
          endpoint: s3.amazonaws.com

    querier:
      frontend_worker:
        frontend_address: tempo-query-frontend:9095
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tempo
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: tempo
          image: grafana/tempo:latest
          args:
            - -config.file=/etc/tempo/tempo.yaml
          volumeMounts:
            - name: config
              mountPath: /etc/tempo
      volumes:
        - name: config
          configMap:
            name: tempo-config

Reference: See assets/jaeger-config.yaml.template

Sampling Strategies

Probabilistic Sampling

yaml

# Sample 1% of traces
sampler:
  type: probabilistic
  param: 0.01

Rate Limiting Sampling

yaml

# Sample max 100 traces per second
sampler:
  type: ratelimiting
  param: 100

Adaptive Sampling

python

from opentelemetry.sdk.trace.sampling import ParentBased, TraceIdRatioBased

# Sample based on trace ID (deterministic)
sampler = ParentBased(root=TraceIdRatioBased(0.01))

Trace Analysis

Finding Slow Requests

Jaeger Query:

service=my-service
duration > 1s

Finding Errors

Jaeger Query:

service=my-service
error=true
tags.http.status_code >= 500

Service Dependency Graph

Jaeger automatically generates service dependency graphs showing:

Service relationships
Request rates
Error rates
Average latencies

Best Practices

Sample appropriately (1-10% in production)
Add meaningful tags (user_id, request_id)
Propagate context across all service boundaries
Log exceptions in spans
Use consistent naming for operations
Monitor tracing overhead (<1% CPU impact)
Set up alerts for trace errors
Implement distributed context (baggage)
Use span events for important milestones
Document instrumentation standards

Integration with Logging

Correlated Logs

python

import logging
from opentelemetry import trace

logger = logging.getLogger(__name__)

def process_request():
    span = trace.get_current_span()
    trace_id = span.get_span_context().trace_id

    logger.info(
        "Processing request",
        extra={"trace_id": format(trace_id, '032x')}
    )

Troubleshooting

No traces appearing:

Check collector endpoint
Verify network connectivity
Check sampling configuration
Review application logs

High latency overhead:

Reduce sampling rate
Use batch span processor
Check exporter configuration

Related Skills

prometheus-configuration - For metrics
grafana-dashboards - For visualization
slo-implementation - For latency SLOs

Maintainer

wshobson Core maintainer

Source details

Full Name: wshobson/agents
Branch: main
Path in repo: plugins/observability-monitoring/skills/distributed-tracing
License: MIT License
Topics: claude-code anthropic anthropic-claude claude claude-code-skills automation agents claudecode claude-skills claude-code-plugins workflows orchestration claude-code-plugin sub-agents claude-code-cli claude-code-subagents subagents claude-code-commands claudecode-config claudecode-subagents

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

wshobson/agents

protocol-reverse-engineering

Master network protocol reverse engineering including packet analysis, protocol dissection, and custom protocol documentation. Use when analyzing network traffic, understanding proprietary protocols, or debugging network communication.

32,911 3,584

Explore

wshobson/agents

binary-analysis-patterns

Master binary analysis patterns including disassembly, decompilation, control flow analysis, and code pattern recognition. Use when analyzing executables, understanding compiled code, or performing static analysis on binaries.

32,911 3,584

Explore

wshobson/agents

anti-reversing-techniques

Understand anti-reversing, obfuscation, and protection techniques encountered during software analysis. Use this skill when analyzing malware evasion techniques, when implementing anti-debugging protections for CTF challenges, when reverse engineering packed binaries, or when building security research tools that need to detect virtualized environments.

32,911 3,584

Explore

wshobson/agents

memory-forensics

Master memory forensics techniques including memory acquisition, process analysis, and artifact extraction using Volatility and related tools. Use when analyzing memory dumps, investigating incidents, or performing malware analysis from RAM captures.

32,911 3,584

Explore

wshobson/agents

nx-workspace-patterns

Configure and optimize Nx monorepo workspaces. Use when setting up Nx, configuring project boundaries, optimizing build caching, or implementing affected commands.

32,911 3,584

Explore

wshobson/agents

auth-implementation-patterns

Master authentication and authorization patterns including JWT, OAuth2, session management, and RBAC to build secure, scalable access control systems. Use when implementing auth systems, securing APIs, or debugging security issues.

32,911 3,584

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Distributed Tracing

Purpose

When to Use

Distributed Tracing Concepts

Trace Structure

Key Components

Jaeger Setup

Kubernetes Deployment

Docker Compose

Application Instrumentation

OpenTelemetry (Recommended)

Python (Flask)

Node.js (Express)

Go

Context Propagation

HTTP Headers

Propagation in HTTP Requests

Python

Node.js

Tempo Setup (Grafana)

Kubernetes Deployment

Sampling Strategies

Probabilistic Sampling

Rate Limiting Sampling

Adaptive Sampling

Trace Analysis

Finding Slow Requests

Finding Errors

Service Dependency Graph

Best Practices

Integration with Logging

Correlated Logs

Troubleshooting

Related Skills

Recommended Agent Skills

protocol-reverse-engineering

binary-analysis-patterns

anti-reversing-techniques

memory-forensics

nx-workspace-patterns

auth-implementation-patterns