Agent skill
eval-notebook
Execute .ipynb notebooks (Python, Kotlin, or any Jupyter kernel) without overwriting; return LLM-friendly JSON with outputs and errors. Use when you need to run or validate a Jupyter notebook.
Install this agent skill to your Project
npx add-skill https://github.com/geggo98/dotfiles/tree/main/modules/ai/_files/skills/notebook
SKILL.md
Notebook Evaluator
1. Purpose
Use this skill to execute Jupyter notebooks (.ipynb) safely without modifying the original file. It evaluates notebooks using their configured kernel and returns structured JSON output with execution results, captured outputs, and any errors—perfect for LLM consumption and automated testing.
Important: Run the script directly (
./scripts/eval_notebook.sh). Do not prefix withbash— the script requires zsh and will fail under bash.
2. Usage Scenarios
Run before:
- Validating notebook changes in a pull request
- Testing notebooks in CI/CD pipelines
- Debugging notebook execution errors
- Verifying notebook reproducibility
3. Helper Scripts
| Script | Purpose | Arguments |
|---|---|---|
scripts/eval_notebook.sh |
Entry point that delegates to Python evaluator | Forwards all arguments to eval_notebook.py |
The wrapper script enforces a global execution timeout via gtimeout (default: 15m). Pass --timeout DURATION to override it. The duration format follows GNU coreutils (e.g. 30s, 5m, 1h). This is separate from the per-notebook --timeout SECONDS option handled by the Python evaluator.
Arguments
- Required: One or more paths to
.ipynb notebook files - Optional: See CLI options below
4. CLI Options
| Option | Default | Description |
|---|---|---|
--timeout SECONDS |
600 | Maximum execution time per notebook |
--iopub-timeout SECONDS |
30 | Timeout for IOPUB messages |
--fail-fast |
false | Stop on first error instead of continuing |
--max-output-chars N |
4000 | Truncate outputs after N characters |
--max-outputs-per-cell N |
6 | Limit outputs captured per cell |
--pretty |
false | Pretty-print JSON output |
Warning: Notebook cells can produce huge output, e.g., when producing diagrams. Make sure to alway choose sane outputs for individual cells.
5. Examples
Basic Evaluation
./scripts/eval_notebook.sh analysis.ipynb --pretty
Executes the notebook and returns pretty-printed JSON with results.
Multiple Notebooks
./scripts/eval_notebook.sh notebook1.ipynb notebook2.ipynb
Returns an array of result objects, one per notebook.
Strict Evaluation
./scripts/eval_notebook.sh analysis.ipynb --fail-fast --timeout 120
Stops immediately on any error with a 2-minute timeout.
6. Output Format
Single Notebook Result
{
"notebook": "/path/to/notebook.ipynb",
"cwd": "/path/to",
"kernelspec": "python3",
"status": "ok",
"duration_ms": 1234,
"exec_exception": null,
"error_count": 0,
"errors": [],
"cells": [
{
"index": 0,
"execution_count": 1,
"source_preview": "print('hello')",
"outputs": [
{
"type": "stream",
"name": "stdout",
"text": "hello\n"
}
],
"output_count": 1
}
]
}
Cell Output Types
| Type | Fields | Description |
|---|---|---|
stream |
name, text |
Standard output/error streams |
execute_result |
mime, text |
Last expression result |
display_data |
mime, text or image_base64_len |
Rich display (images, HTML) |
error |
ename, evalue, traceback |
Python exception |
7. Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success (notebook executed, may contain errors in results) |
| 1 | Script error (invalid arguments, file not found) |
Note: Cell execution errors are reported in the JSON output; the script itself succeeds if it can evaluate the notebook.
8. Your Task
When processing evaluation results:
-
If status=ok: Provide a concise summary of key outputs and execution time.
-
If status=error:
- List each error by
cell_indexwithename,evalue, and relevant traceback lines - Identify the most likely root cause
- Propose the fastest verification step
- If code changes are needed, describe them precisely
- List each error by
-
Never overwrite the original notebook file—this skill is read-only by design.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
nix-shell
Search Nix packages and run commands with packages from nixpkgs that are not installed locally. Use when you need a package not available locally or want to search nixpkgs.
tmux
Remote control tmux sessions for interactive CLIs (python, gdb, etc.) by sending keystrokes and scraping pane output.
slidev
Create and present web-based slidedecks for developers using Slidev with Markdown, Vue components, code highlighting, animations, and interactive features. Use when building technical presentations, conference talks, code walkthroughs, teaching materials, or developer decks. Also trigger when the user mentions Slidev, sli.dev, slide decks with code, or wants to create developer-facing presentations.
diagram-render
Render PlantUML (@startuml…@enduml) and Mermaid fenced blocks to a self-contained HTML preview; if rendering fails, the error text must be embedded in the output image. Use when the user asks to render, preview, or export diagrams.
adr-writing
Use when documenting significant architectural decisions. Creates focused ADRs explaining context, decision, and alternatives. Prevents vague documentation and implementation detail bloat. Triggers: 'create ADR', 'document decision', making technology/framework/persistence/auth choices, cross-cutting concerns.
writing-clearly-and-concisely
Use when writing documentation, commit messages, error text, explanations, reports, or summaries. Applies Strunk's principles for clear, vigorous prose. Triggers: writing human-readable content, verbose text, unclear explanations.
Didn't find tool you were looking for?