Agent skill
notebook-writer
Create and document Jupyter notebooks for reproducible analyses
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/notebook-writer
SKILL.md
Notebook Writer Skill
You are a specialist in creating well-structured Jupyter notebooks for scientific analyses and documentation.
When to Use This Skill
Use this skill when:
- Creating parameter sweeps or sensitivity analyses
- Documenting calculations with reproducible code
- Generating analysis reports that combine code, results, and interpretation
- Packaging agent work (Calculator, Researcher) into shareable notebooks
Notebook Format: Jupytext Markdown
We use Jupytext-compatible Markdown for notebooks to enable git-friendly version control.
Cell Markers
- Markdown cells: Regular Markdown text (no special marker)
- Code cells: Start with
# %%on its own line
Example Structure
---
jupyter:
kernelspec:
display_name: Python 3
language: python
name: python3
---
# Analysis Title
Brief description of what this notebook does.
## Section 1: Data Loading
# %%
import pandas as pd
import numpy as np
# %%
data = pd.read_csv('data.csv')
data.head()
## Section 2: Analysis
Explanation of the analysis approach.
# %%
# Perform calculation
result = np.mean(data['value'])
print(f"Mean: {result:.2f}")
Python Utility API
Many projects provide src/utils/notebook_builder.py with helper functions for programmatic notebook creation.
Core Function: create_notebook_markdown
create_notebook_markdown(
title: str,
cells: List[Dict[str, str]],
output_path: Path,
kernelspec: Optional[Dict] = None
) -> Path
Parameters:
title: Notebook title (becomes H1 header)cells: List of dicts with'type'('code'or'markdown') and'content'output_path: Where to save.mdfilekernelspec: Optional kernel specification (defaults to Python 3)
Example:
from pathlib import Path
from src.utils.notebook_builder import create_notebook_markdown
cells = [
{'type': 'markdown', 'content': '## Introduction\n\nThis analysis...'},
{'type': 'code', 'content': 'import numpy as np'},
{'type': 'code', 'content': 'x = np.linspace(0, 10)\nprint(x)'}
]
create_notebook_markdown(
title="My Analysis",
cells=cells,
output_path=Path('docs/analysis/my_analysis.md')
)
Template: Parameter Sweep
create_parameter_sweep_notebook(
param_name: str,
param_range: str,
calculation_code: str,
output_path: Path
) -> Path
Creates a notebook with:
- Imports (numpy, matplotlib, pandas)
- Parameter range definition
- Your calculation code
- Visualization boilerplate
Example:
from pathlib import Path
from src.utils.notebook_builder import create_parameter_sweep_notebook
create_parameter_sweep_notebook(
param_name='temperature',
param_range='np.linspace(20, 40, 20)',
calculation_code='''
# Reaction rate calculation
results = []
for T in temperature_values:
rate = arrhenius_equation(T, activation_energy)
results.append(rate)
''',
output_path=Path('analysis/temperature_sweep.md')
)
Template: Analysis Report
create_analysis_report_notebook(
analysis_title: str,
sections: List[Dict[str, str]],
output_path: Path
) -> Path
Section dict keys:
title: Section heading (required)description: Explanatory text (optional)code: Code to execute (optional)interpretation: Results interpretation (optional)
Example:
from src.utils.notebook_builder import create_analysis_report_notebook
sections = [
{
'title': 'Model Setup',
'description': 'Define parameters',
'code': 'diffusion_coeff = 2.1e-5 # cm²/s'
},
{
'title': 'Calculation',
'code': 'result = compute_model(diffusion_coeff)',
'interpretation': 'Result shows X is dominated by Y'
}
]
create_analysis_report_notebook(
'Transport Analysis',
sections,
Path('analysis/transport.md')
)
Validation Function
validate_notebook(notebook_path: Path) -> bool
Validates .ipynb structure using nbformat. Returns True if valid, raises exception if invalid.
Example:
from pathlib import Path
from src.utils.notebook_builder import validate_notebook
validate_notebook(Path('analysis/notebook.ipynb'))
# Returns True or raises ValidationError
Workflow
- Create notebook using utility or manual Markdown
- Edit
.mdfile directly (agents write Markdown well) - Convert to
.ipynb:python3 -m jupytext --to ipynb notebook.md - Run in Jupyter:
jupyter notebook notebook.ipynb - Sync changes back:
python3 -m jupytext --sync notebook.ipynb(bidirectional)
Jupyter AI Integration
Modern Jupyter environments (JupyterLab 4.0+, JetBrains IDEs) provide AI-powered assistance to enhance productivity and reduce errors.
%%ai Magic Commands
The %%ai cell magic enables AI-powered code generation and analysis directly in notebooks:
# %%
# %load_ext jupyter_ai_magics
# %%
%%ai chatgpt
Generate a function to calculate the Pearson correlation coefficient between two arrays
Key use cases:
- Code generation: Generate boilerplate code, data transformations, or analysis functions
- Data exploration: Ask questions about DataFrames or arrays
- Debugging assistance: Get suggestions for fixing errors
- Documentation: Generate docstrings or explanations
Providing Context for Better Results
AI assistants work best when given relevant context. Always provide:
-
API documentation: For specialized libraries (scanpy, pydeseq2, biopython)
python# Include relevant API documentation in a markdown cell # Example: scanpy.pp.filter_cells(data, min_genes=200) -
Dataset descriptions: Shape, columns, data types
python# Document your data structure: # RNA-seq counts matrix: 20,000 genes × 5,000 cells # AnnData object: .X (sparse CSR matrix), .obs (cell metadata), .var (gene metadata) -
Domain context: Biological meaning, expected ranges, units
python# Oxygen consumption rate: 10-20 pmol/s/million cells # Temperature: 37°C, pH: 7.4
Chat UI Assistance
JupyterLab's chat interface provides conversational help:
Best practices:
- Use for exploratory questions: "What's the best way to normalize this data?"
- Ask for code review: "Does this analysis handle missing values correctly?"
- Request visualizations: "Create a heatmap of the top 50 variable genes"
- Get explanations: "Explain what this cell is doing"
When to Use AI Assistance vs. Manual Coding
Use AI assistance for:
- Boilerplate code (imports, data loading templates)
- Exploratory analysis (quick plots, summary statistics)
- Learning new library syntax
- Generating test data or examples
Write code manually for:
- Core analysis logic (hypothesis testing, modeling)
- Publication-quality figures (fine-grained control needed)
- Performance-critical sections (AI-generated code may not be optimal)
- Complex domain-specific algorithms
Warning: Always verify AI-generated code. Check for:
- Correct library syntax (APIs change frequently)
- Appropriate statistical methods (AI may suggest invalid tests)
- Proper handling of biological data (species, units, measurement context)
JetBrains AI Assistant
For notebooks in PyCharm/DataSpell:
Features:
- Explain cell: Understand what code does (Alt+Enter → "Explain")
- Create visualization: Generate plots from data descriptions
- Edit cell: Refactor or improve code (Alt+Enter → "AI Actions")
- Fix errors: Get suggestions for runtime errors
Access: Right-click cell → "AI Assistant" or use AI chat sidebar
Jupytext Configuration
Projects should include .jupytext.toml in repository root:
# Jupytext configuration
# Enables git-friendly notebook version control
# Pair markdown and ipynb files
# Use myst format which supports # %% cell markers
formats = "md:myst,ipynb"
This tells Jupytext to:
- Recognize
.mdfiles as notebooks - Use MyST Markdown format (supports
# %%markers) - Auto-sync with
.ipynbwhen either is modified
Git Tracking Strategy
Recommended .gitignore configuration:
# Track .md notebooks (Jupytext source), ignore generated .ipynb files
*.ipynb
.ipynb_checkpoints/
What's tracked:
- ✅
.mdnotebook files (human-readable source) - ❌
.ipynbfiles (generated, binary JSON) - ❌
.ipynb_checkpoints/(Jupyter temp files)
Rationale: .md files produce readable git diffs. .ipynb files are JSON with embedded outputs and can be regenerated from .md.
Common Operations
Create from scratch (manual)
- Write Markdown file with
# %%markers - Add YAML frontmatter (kernel info)
- Convert:
python3 -m jupytext --to ipynb file.md
Convert existing notebook to Markdown
python3 -m jupytext --to md:myst notebook.ipynb
Edit existing notebook
Option 1: Edit .md file directly (recommended for agents)
# Edit notebook.md in text editor
# Then convert:
python3 -m jupytext --to ipynb notebook.md
Option 2: Edit in Jupyter, sync back
jupyter notebook notebook.ipynb
# Make changes in Jupyter
# Sync back to .md:
python3 -m jupytext --sync notebook.ipynb
Validate structure
python3 -c "
from pathlib import Path
from src.utils.notebook_builder import validate_notebook
validate_notebook(Path('notebook.ipynb'))
print('✓ Valid')
"
Convert multiple notebooks
# Convert all .md notebooks in a directory
python3 -m jupytext --to ipynb analysis/*.md
# Or sync all paired notebooks
python3 -m jupytext --sync analysis/*.ipynb
Best Practices
- Title every notebook with clear purpose
- Start with imports in first code cell
- Explain calculations with markdown cells before code
- Interpret results with markdown cells after code
- Use meaningful variable names (not x, y, z)
- Include units in comments and axis labels
- Save outputs (figures) to files for documentation
Reproducibility Standards
Scientific notebooks must be fully reproducible. Every notebook should enable another researcher to:
- Recreate your computational environment
- Rerun your analysis and get identical results
- Understand your data sources and processing steps
Environment Documentation
Every notebook must include an environment documentation cell:
# %%
# Environment Information
# Run: pip freeze > requirements.txt
# Or: conda env export > environment.yml
import sys
import numpy as np
import pandas as pd
import scanpy as sc # Example for single-cell analysis
print(f"Python: {sys.version}")
print(f"NumPy: {np.__version__}")
print(f"Pandas: {pd.__version__}")
print(f"Scanpy: {sc.__version__}")
# Include this output in your notebook for documentation
Create environment files:
# For pip users:
pip freeze > requirements.txt
# For conda users:
conda env export > environment.yml
# Include these files in your repository
Document kernel selection:
## Computational Environment
- **Kernel**: Python 3.11 (project-env)
- **Dependencies**: See `requirements.txt` for full package list
- **Critical packages**: scanpy==1.10.0, numpy==1.26.3, pandas==2.2.0
Random Seed Setting
For any stochastic process, set random seeds:
# %%
# Set random seeds for reproducibility
import numpy as np
import random
RANDOM_SEED = 42 # Document why this value was chosen (convention, previous analysis, etc.)
np.random.seed(RANDOM_SEED)
random.seed(RANDOM_SEED)
# For machine learning:
import torch
torch.manual_seed(RANDOM_SEED)
# For scanpy:
import scanpy as sc
sc.settings.seed = RANDOM_SEED
print(f"Random seed set to {RANDOM_SEED}")
Stochastic processes requiring seeds:
- UMAP, t-SNE (dimensionality reduction)
- Random forest, neural networks (machine learning)
- Monte Carlo simulations
- Random sampling or bootstrapping
- Graph algorithms with random initialization
Session Info Output
End every notebook with a session info cell:
# %%
# Session Information (for reproducibility)
import session_info
session_info.show(
dependencies=True,
html=False
)
# Alternative for single-cell analysis:
# import scanpy as sc
# sc.logging.print_versions()
This captures:
- Python version
- Operating system
- Package versions (all dependencies)
- Execution timestamp
File Path Best Practices
Use relative paths and variables:
# %%
from pathlib import Path
# Define paths at the top of the notebook
DATA_DIR = Path("data/raw")
RESULTS_DIR = Path("results/analysis_2025-01-29")
# Ensure output directories exist
RESULTS_DIR.mkdir(parents=True, exist_ok=True)
# Use variables throughout
input_file = DATA_DIR / "counts.csv"
output_file = RESULTS_DIR / "normalized_counts.csv"
Never use hardcoded absolute paths:
# BAD:
data = pd.read_csv("/Users/yourname/project/data.csv")
# GOOD:
data = pd.read_csv(DATA_DIR / "data.csv")
Reproducibility Checklist
Before sharing or archiving a notebook:
- Environment documented (Python version, key package versions)
-
requirements.txtorenvironment.ymlexists and is current - Random seeds set for all stochastic processes
- Session info cell at end of notebook
- File paths use variables (not hardcoded)
- Data sources documented (where to download, version, date)
- Notebook runs end-to-end without errors (Restart & Run All)
- Results match expected output (if re-running existing analysis)
Integration with other skills:
- notebook-debugger: Use to verify end-to-end execution
- bioinformatician: Apply reproducibility standards to all computational biology analyses
- copilot: Review notebooks for reproducibility compliance
Project-Specific Usage
Many projects have a docs/NOTEBOOK-WORKFLOW.md or similar document with project-specific examples and patterns. Check your project's documentation for:
- Domain-specific notebook templates
- Agent integration patterns (which skills create notebooks)
- Directory structure conventions (where to save notebooks)
- Project-specific best practices
Troubleshooting
Issue: Jupytext can't find format
Error: Format 'percent' is not associated to extension '.md'
Fix: Use md:myst format in .jupytext.toml (not md:percent). MyST Markdown supports # %% markers.
formats = "md:myst,ipynb" # Correct
Issue: Sync not working
Symptom: Changes to .ipynb don't appear in .md
Solution:
- Check
.jupytext.tomlexists and has correct format - Run sync explicitly:
python3 -m jupytext --sync notebook.ipynb - Check both files exist (create
.mdfirst if needed)
Issue: Validation fails
Error: nbformat.ValidationError
Causes:
- Missing cell IDs (required in nbformat v4.5+)
- Invalid JSON structure
- Missing required fields
Solution: Use notebook_builder.py utility functions which handle validation automatically.
Issue: Git shows .ipynb files
Symptom: .ipynb files appearing in git status
Fix: Ensure .gitignore contains *.ipynb. Check with:
git check-ignore -v notebook.ipynb
Error Prevention
Common Issues
- Missing
# %%: Code cells must start with this marker - Frontmatter syntax: YAML header must be exact (see example structure above)
- Path handling: Use
Pathobjects, ensure directories exist - Cell validation: Use
notebook_builder.validate_notebook()after creation
Validation Checklist
Before finalizing a notebook:
- Frontmatter present with kernelspec
- All code cells have
# %%marker - Imports in first code cell
- Results interpreted with markdown
- Saved to appropriate location
- Validated with nbformat if creating .ipynb directly
Dependencies
Required packages:
pip3 install jupytext nbformat
Check installation:
pip3 list | grep -E "(jupytext|nbformat)"
Tested versions:
- jupytext: 1.19+
- nbformat: 5.10+
- Python: 3.9+
Integration with Other Skills
Common patterns for skill integration:
- Quantitative analysis skills: Package calculations as reproducible notebooks with parameter sweeps
- Research skills: Document literature-derived parameters with citations in data notebooks
- Planning skills: Generate protocol notebooks with expected results and analysis templates
- Review skills: Check notebook code for correctness and best practices
Remember: Notebooks are for interactive exploration and reproducible documentation. For production code, use Python modules in src/.
For project-specific examples and patterns, see your project's documentation (often docs/NOTEBOOK-WORKFLOW.md or similar).
Didn't find tool you were looking for?