Agent skill
Python Performance
Master Python optimization techniques, profiling, memory management, and high-performance computing
Install this agent skill to your Project
npx add-skill https://github.com/pluginagentmarketplace/custom-plugin-python/tree/main/skills/python-performance
SKILL.md
Python Performance Optimization
Overview
Master performance optimization in Python. Learn to profile code, identify bottlenecks, optimize algorithms, manage memory efficiently, and leverage high-performance libraries for compute-intensive tasks.
Learning Objectives
- Profile Python code to identify bottlenecks
- Optimize algorithms and data structures
- Manage memory efficiently
- Use compiled extensions (Cython, NumPy)
- Implement caching strategies
- Parallelize CPU-bound operations
- Benchmark and measure improvements
Core Topics
1. Profiling & Benchmarking
- timeit module for micro-benchmarks
- cProfile for function-level profiling
- line_profiler for line-by-line analysis
- memory_profiler for memory usage
- py-spy for production profiling
- Flame graphs and visualization
Code Example:
import timeit
import cProfile
import pstats
# 1. timeit for micro-benchmarks
def list_comprehension():
return [x**2 for x in range(1000)]
def map_function():
return list(map(lambda x: x**2, range(1000)))
# Compare performance
time_lc = timeit.timeit(list_comprehension, number=10000)
time_map = timeit.timeit(map_function, number=10000)
print(f"List comprehension: {time_lc:.4f}s")
print(f"Map function: {time_map:.4f}s")
# 2. cProfile for function profiling
def process_data():
data = []
for i in range(100000):
data.append(i ** 2)
return sum(data)
profiler = cProfile.Profile()
profiler.enable()
result = process_data()
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10)
# 3. Line profiling (requires line_profiler package)
# @profile decorator (add manually for line_profiler)
def slow_function():
total = 0
for i in range(1000000):
total += i ** 2
return total
# Run with: kernprof -l -v script.py
# 4. Memory profiling
from memory_profiler import profile
@profile
def memory_intensive():
large_list = [i for i in range(1000000)]
large_dict = {i: i**2 for i in range(1000000)}
return len(large_list) + len(large_dict)
# Run with: python -m memory_profiler script.py
2. Algorithm & Data Structure Optimization
- Choosing efficient data structures
- Time complexity analysis
- Generator expressions vs lists
- Set operations for lookups
- Deque for queue operations
- Bisect for sorted lists
Code Example:
import bisect
from collections import deque, Counter, defaultdict
import time
# 1. List vs Set for membership testing
# Bad: O(n) lookup
def find_in_list(items, target):
return target in items # Linear search
# Good: O(1) lookup
def find_in_set(items, target):
items_set = set(items)
return target in items_set
items = list(range(100000))
# List: 0.001s, Set: 0.000001s (1000x faster!)
# 2. Generator expressions for memory efficiency
# Bad: Creates entire list in memory
squares_list = [x**2 for x in range(1000000)] # ~4MB
# Good: Generates on-demand
squares_gen = (x**2 for x in range(1000000)) # ~128 bytes
# 3. Deque for efficient queue operations
# Bad: O(n) pop from beginning
queue_list = list(range(10000))
queue_list.pop(0) # Slow
# Good: O(1) pop from both ends
queue_deque = deque(range(10000))
queue_deque.popleft() # Fast
# 4. Bisect for maintaining sorted lists
# Bad: O(n) insertion into sorted list
sorted_list = []
for i in [5, 2, 8, 1, 9]:
sorted_list.append(i)
sorted_list.sort()
# Good: O(log n) insertion
sorted_list = []
for i in [5, 2, 8, 1, 9]:
bisect.insort(sorted_list, i)
# 5. Counter for frequency counting
# Bad: Manual counting
word_count = {}
for word in words:
if word in word_count:
word_count[word] += 1
else:
word_count[word] = 1
# Good: Counter
word_count = Counter(words)
most_common = word_count.most_common(10)
3. Memory Management
- Memory allocation and garbage collection
- Object pooling
- Slots for memory-efficient classes
- Reference counting
- Weak references
- Memory leaks detection
Code Example:
import gc
import sys
from weakref import WeakValueDictionary
# 1. __slots__ for memory-efficient classes
# Bad: Regular class (56 bytes per instance)
class RegularPoint:
def __init__(self, x, y):
self.x = x
self.y = y
# Good: Slots class (32 bytes per instance - 43% smaller!)
class SlottedPoint:
__slots__ = ['x', 'y']
def __init__(self, x, y):
self.x = x
self.y = y
print(sys.getsizeof(RegularPoint(1, 2))) # 56 bytes
print(sys.getsizeof(SlottedPoint(1, 2))) # 32 bytes
# 2. Object pooling for expensive objects
class ObjectPool:
def __init__(self, factory, max_size=10):
self.factory = factory
self.max_size = max_size
self.pool = []
def acquire(self):
if self.pool:
return self.pool.pop()
return self.factory()
def release(self, obj):
if len(self.pool) < self.max_size:
self.pool.append(obj)
# Usage
db_pool = ObjectPool(lambda: DatabaseConnection(), max_size=5)
conn = db_pool.acquire()
# Use connection
db_pool.release(conn)
# 3. Weak references to prevent memory leaks
class Cache:
def __init__(self):
self._cache = WeakValueDictionary()
def get(self, key):
return self._cache.get(key)
def set(self, key, value):
self._cache[key] = value
# 4. Manual garbage collection for large operations
def process_large_dataset():
for batch in large_data:
process_batch(batch)
# Force garbage collection after each batch
gc.collect()
# 5. Context managers for resource cleanup
class ManagedResource:
def __enter__(self):
self.resource = allocate_resource()
return self.resource
def __exit__(self, exc_type, exc_val, exc_tb):
self.resource.cleanup()
return False
4. High-Performance Computing
- NumPy vectorization
- Numba JIT compilation
- Cython for C extensions
- Multiprocessing for parallelism
- Concurrent.futures
- Performance comparison
Code Example:
import numpy as np
from numba import jit
import multiprocessing as mp
from concurrent.futures import ProcessPoolExecutor
# 1. NumPy vectorization
# Bad: Python loops (slow)
def python_sum(n):
total = 0
for i in range(n):
total += i ** 2
return total
# Good: NumPy vectorization (100x faster!)
def numpy_sum(n):
arr = np.arange(n)
return np.sum(arr ** 2)
# Benchmark: python_sum(1000000) = 0.15s
# numpy_sum(1000000) = 0.002s
# 2. Numba JIT compilation
@jit(nopython=True) # Compile to machine code
def fast_function(n):
total = 0
for i in range(n):
total += i ** 2
return total
# First call: compilation + execution
# Subsequent calls: 50x faster than pure Python!
# 3. Multiprocessing for CPU-bound tasks
def cpu_intensive_task(n):
return sum(i * i for i in range(n))
# Single process
result = cpu_intensive_task(10000000)
# Multiple processes
with ProcessPoolExecutor(max_workers=4) as executor:
ranges = [2500000, 2500000, 2500000, 2500000]
results = executor.map(cpu_intensive_task, ranges)
total = sum(results)
# 4x speedup on 4 cores!
# 4. Caching for expensive computations
from functools import lru_cache
@lru_cache(maxsize=128)
def fibonacci(n):
if n < 2:
return n
return fibonacci(n-1) + fibonacci(n-2)
# fibonacci(100) without cache: ~forever
# fibonacci(100) with cache: instant
# 5. Memory views for zero-copy operations
def process_array(data):
# Bad: Creates copy
subset = data[1000:2000]
# Good: Zero-copy view
view = memoryview(data)[1000:2000]
Hands-On Practice
Project 1: Performance Profiler
Build a comprehensive profiling tool.
Requirements:
- CPU profiling with cProfile
- Memory profiling
- Line-by-line analysis
- Visualization (flame graphs)
- HTML report generation
- Bottleneck identification
Key Skills: Profiling tools, visualization, analysis
Project 2: Data Processing Pipeline
Optimize data processing pipeline.
Requirements:
- Load large CSV files (1GB+)
- Transform and clean data
- Aggregate statistics
- Compare Python/NumPy/Pandas approaches
- Measure memory usage
- Optimize to <2GB RAM
Key Skills: NumPy, memory optimization, benchmarking
Project 3: Parallel Computing
Implement parallel algorithms.
Requirements:
- Matrix multiplication
- Image processing
- Monte Carlo simulation
- Compare threading/multiprocessing/asyncio
- Measure speedup
- Handle shared state
Key Skills: Parallelism, performance measurement
Assessment Criteria
- Profile code to identify bottlenecks
- Choose appropriate data structures
- Optimize algorithms for time complexity
- Manage memory efficiently
- Use vectorization where applicable
- Implement effective caching
- Parallelize CPU-bound operations
Resources
Official Documentation
- Python Performance Tips - Official tips
- NumPy Docs - NumPy documentation
- Numba Docs - JIT compilation
Learning Platforms
- High Performance Python - O'Reilly book
- Python Performance - Real Python guide
- Optimizing Python - PyCon talks
Tools
- cProfile - CPU profiling
- memory_profiler - Memory profiling
- py-spy - Sampling profiler
- Scalene - CPU/GPU/memory profiler
Next Steps
After mastering Python performance, explore:
- Cython - C extensions for Python
- PyPy - Alternative Python interpreter
- Dask - Parallel computing library
- CUDA - GPU programming with Python
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
Debugging
Python debugging techniques, pdb, and IDE debugging tools
Pandas Data Analysis
Master data manipulation, analysis, and visualization with Pandas, NumPy, and Matplotlib
FastAPI
FastAPI web framework for building modern APIs with async support
Poetry Packaging
Master Python package management with Poetry, dependency resolution, publishing, and project structure
Django Framework
Build production-ready web applications with Django MVC, ORM, authentication, and REST APIs
Python Fundamentals
Master Python syntax, data types, control flow, functions, OOP, and standard library
Didn't find tool you were looking for?