Agent skill
golang-performance
Go performance optimization techniques including profiling with pprof, memory optimization, concurrency patterns, and escape analysis.
Install this agent skill to your Project
npx add-skill https://github.com/aiskillstore/marketplace/tree/main/skills/89jobrien/golang-performance
SKILL.md
Golang Performance
This skill provides guidance on optimizing Go application performance including profiling, memory management, concurrency optimization, and avoiding common performance pitfalls.
When to Use This Skill
- When profiling Go applications for CPU or memory issues
- When optimizing memory allocations and reducing GC pressure
- When implementing efficient concurrency patterns
- When analyzing escape analysis results
- When optimizing hot paths in production code
Profiling with pprof
Enable Profiling in HTTP Server
import (
"net/http"
_ "net/http/pprof"
)
func main() {
// pprof endpoints available at /debug/pprof/
go func() {
http.ListenAndServe("localhost:6060", nil)
}()
// Main application
}
CPU Profiling
# Collect 30-second CPU profile
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
# Interactive commands
(pprof) top10 # Top 10 functions by CPU
(pprof) list FuncName # Show source with timing
(pprof) web # Open flame graph in browser
Memory Profiling
# Heap profile
go tool pprof http://localhost:6060/debug/pprof/heap
# Allocs profile (all allocations)
go tool pprof http://localhost:6060/debug/pprof/allocs
# Interactive commands
(pprof) top10 -cum # Top by cumulative allocations
(pprof) list FuncName # Show allocation sites
Programmatic Profiling
import (
"os"
"runtime/pprof"
)
func profileCPU() {
f, _ := os.Create("cpu.prof")
defer f.Close()
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
// Code to profile
}
func profileMemory() {
f, _ := os.Create("mem.prof")
defer f.Close()
runtime.GC() // Get accurate stats
pprof.WriteHeapProfile(f)
}
Memory Optimization
Reduce Allocations
// BAD: Allocates on every call
func Process(items []string) []string {
result := []string{}
for _, item := range items {
result = append(result, transform(item))
}
return result
}
// GOOD: Pre-allocate with known capacity
func Process(items []string) []string {
result := make([]string, 0, len(items))
for _, item := range items {
result = append(result, transform(item))
}
return result
}
Use sync.Pool for Frequent Allocations
var bufferPool = sync.Pool{
New: func() interface{} {
return new(bytes.Buffer)
},
}
func ProcessRequest(data []byte) []byte {
buf := bufferPool.Get().(*bytes.Buffer)
defer func() {
buf.Reset()
bufferPool.Put(buf)
}()
// Use buffer
buf.Write(data)
return buf.Bytes()
}
Avoid String Concatenation in Loops
// BAD: O(n^2) allocations
func BuildString(parts []string) string {
result := ""
for _, part := range parts {
result += part
}
return result
}
// GOOD: Single allocation
func BuildString(parts []string) string {
var builder strings.Builder
for _, part := range parts {
builder.WriteString(part)
}
return builder.String()
}
Slice Memory Leaks
// BAD: Keeps entire backing array alive
func GetFirst(data []byte) []byte {
return data[:10]
}
// GOOD: Copy to release backing array
func GetFirst(data []byte) []byte {
result := make([]byte, 10)
copy(result, data[:10])
return result
}
Escape Analysis
# Show escape analysis decisions
go build -gcflags="-m" ./...
# More verbose
go build -gcflags="-m -m" ./...
Avoiding Heap Escapes
// ESCAPES: Returned pointer
func NewUser() *User {
return &User{} // Allocated on heap
}
// STAYS ON STACK: Value return
func NewUser() User {
return User{} // May stay on stack
}
// ESCAPES: Interface conversion
func Process(v interface{}) { ... }
func main() {
x := 42
Process(x) // x escapes to heap
}
Concurrency Optimization
Worker Pool Pattern
func ProcessItems(items []Item, workers int) []Result {
jobs := make(chan Item, len(items))
results := make(chan Result, len(items))
// Start workers
var wg sync.WaitGroup
for i := 0; i < workers; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for item := range jobs {
results <- process(item)
}
}()
}
// Send jobs
for _, item := range items {
jobs <- item
}
close(jobs)
// Wait and collect
go func() {
wg.Wait()
close(results)
}()
var output []Result
for r := range results {
output = append(output, r)
}
return output
}
Buffered Channels for Throughput
// SLOW: Unbuffered causes blocking
ch := make(chan int)
// FAST: Buffer reduces contention
ch := make(chan int, 100)
Avoid Lock Contention
// BAD: Global lock
var mu sync.Mutex
var cache = make(map[string]string)
func Get(key string) string {
mu.Lock()
defer mu.Unlock()
return cache[key]
}
// GOOD: Sharded locks
type ShardedCache struct {
shards [256]struct {
mu sync.RWMutex
items map[string]string
}
}
func (c *ShardedCache) getShard(key string) *struct {
mu sync.RWMutex
items map[string]string
} {
h := fnv.New32a()
h.Write([]byte(key))
return &c.shards[h.Sum32()%256]
}
func (c *ShardedCache) Get(key string) string {
shard := c.getShard(key)
shard.mu.RLock()
defer shard.mu.RUnlock()
return shard.items[key]
}
Use sync.Map for Specific Cases
// Good for: keys written once, read many; disjoint key sets
var cache sync.Map
func Get(key string) (string, bool) {
v, ok := cache.Load(key)
if !ok {
return "", false
}
return v.(string), true
}
func Set(key, value string) {
cache.Store(key, value)
}
Data Structure Optimization
Struct Field Ordering (Memory Alignment)
// BAD: 24 bytes (padding)
type Bad struct {
a bool // 1 byte + 7 padding
b int64 // 8 bytes
c bool // 1 byte + 7 padding
}
// GOOD: 16 bytes (no padding)
type Good struct {
b int64 // 8 bytes
a bool // 1 byte
c bool // 1 byte + 6 padding
}
Avoid Interface{} When Possible
// SLOW: Type assertions, boxing
func Sum(values []interface{}) float64 {
var sum float64
for _, v := range values {
sum += v.(float64)
}
return sum
}
// FAST: Concrete types
func Sum(values []float64) float64 {
var sum float64
for _, v := range values {
sum += v
}
return sum
}
Benchmarking Patterns
func BenchmarkProcess(b *testing.B) {
data := generateTestData()
b.ResetTimer() // Exclude setup time
for i := 0; i < b.N; i++ {
Process(data)
}
}
// Memory benchmarks
func BenchmarkAllocs(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
_ = make([]byte, 1024)
}
}
// Compare implementations
func BenchmarkComparison(b *testing.B) {
b.Run("old", func(b *testing.B) {
for i := 0; i < b.N; i++ {
OldImplementation()
}
})
b.Run("new", func(b *testing.B) {
for i := 0; i < b.N; i++ {
NewImplementation()
}
})
}
Run with:
go test -bench=. -benchmem ./...
go test -bench=. -benchtime=5s ./... # Longer runs
Common Pitfalls
Defer in Hot Loops
// BAD: Defer overhead per iteration
for _, item := range items {
mu.Lock()
defer mu.Unlock() // Defers stack up!
process(item)
}
// GOOD: Explicit unlock
for _, item := range items {
mu.Lock()
process(item)
mu.Unlock()
}
// BETTER: Extract to function
for _, item := range items {
processWithLock(item)
}
func processWithLock(item Item) {
mu.Lock()
defer mu.Unlock()
process(item)
}
JSON Encoding Performance
// SLOW: Reflection on every call
json.Marshal(v)
// FAST: Reuse encoder
var buf bytes.Buffer
encoder := json.NewEncoder(&buf)
encoder.Encode(v)
// FASTER: Code generation (easyjson, ffjson)
Best Practices
- Measure before optimizing - Profile to find actual bottlenecks
- Pre-allocate slices - Use
make([]T, 0, capacity)when size is known - Pool frequently allocated objects - Use
sync.Poolfor buffers - Minimize allocations in hot paths - Reuse objects, avoid interfaces
- Right-size channels - Buffer to reduce blocking without wasting memory
- Avoid premature optimization - Clarity first, optimize measured problems
- Use value receivers for small structs - Avoid pointer indirection
- Order struct fields by size - Largest to smallest reduces padding
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
perigon-backend
Perigon ASP.NET Core + EF Core + Aspire conventions
perigon-agent
Pointers for Copilot/agents to apply Perigon conventions
perigon-angular
Angular 21+ standalone/Material/signal conventions for Perigon WebApp
fastapi-mastery
Comprehensive FastAPI development skill covering REST API creation, routing, request/response handling, validation, authentication, database integration, middleware, and deployment. Use when working with FastAPI projects, building APIs, implementing CRUD operations, setting up authentication/authorization, integrating databases (SQL/NoSQL), adding middleware, handling WebSockets, or deploying FastAPI applications. Triggered by requests involving .py files with FastAPI code, API endpoint creation, Pydantic models, or FastAPI-specific features.
context7-efficient
Token-efficient library documentation fetcher using Context7 MCP with 86.8% token savings through intelligent shell pipeline filtering. Fetches code examples, API references, and best practices for JavaScript, Python, Go, Rust, and other libraries. Use when users ask about library documentation, need code examples, want API usage patterns, are learning a new framework, need syntax reference, or troubleshooting with library-specific information. Triggers include questions like "Show me React hooks", "How do I use Prisma", "What's the Next.js routing syntax", or any request for library/framework documentation.
browser-use
Browser automation using Playwright MCP. Navigate websites, fill forms, click elements, take screenshots, and extract data. Use when tasks require web browsing, form submission, web scraping, UI testing, or any browser interaction.
Didn't find tool you were looking for?