Agent skill

eval-notebook

Execute .ipynb notebooks (Python, Kotlin, or any Jupyter kernel) without overwriting; return LLM-friendly JSON with outputs and errors. Use when you need to run or validate a Jupyter notebook.

Stars 1
Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/geggo98/dotfiles/tree/main/modules/ai/_files/skills/notebook

SKILL.md

Notebook Evaluator

1. Purpose

Use this skill to execute Jupyter notebooks (.ipynb) safely without modifying the original file. It evaluates notebooks using their configured kernel and returns structured JSON output with execution results, captured outputs, and any errors—perfect for LLM consumption and automated testing.

Important: Run the script directly (./scripts/eval_notebook.sh). Do not prefix with bash — the script requires zsh and will fail under bash.

2. Usage Scenarios

Run before:

  • Validating notebook changes in a pull request
  • Testing notebooks in CI/CD pipelines
  • Debugging notebook execution errors
  • Verifying notebook reproducibility

3. Helper Scripts

Script Purpose Arguments
scripts/eval_notebook.sh Entry point that delegates to Python evaluator Forwards all arguments to eval_notebook.py

The wrapper script enforces a global execution timeout via gtimeout (default: 15m). Pass --timeout DURATION to override it. The duration format follows GNU coreutils (e.g. 30s, 5m, 1h). This is separate from the per-notebook --timeout SECONDS option handled by the Python evaluator.

Arguments

  • Required: One or more paths to .ipynb notebook files
  • Optional: See CLI options below

4. CLI Options

Option Default Description
--timeout SECONDS 600 Maximum execution time per notebook
--iopub-timeout SECONDS 30 Timeout for IOPUB messages
--fail-fast false Stop on first error instead of continuing
--max-output-chars N 4000 Truncate outputs after N characters
--max-outputs-per-cell N 6 Limit outputs captured per cell
--pretty false Pretty-print JSON output

Warning: Notebook cells can produce huge output, e.g., when producing diagrams. Make sure to alway choose sane outputs for individual cells.

5. Examples

Basic Evaluation

bash
./scripts/eval_notebook.sh analysis.ipynb --pretty

Executes the notebook and returns pretty-printed JSON with results.

Multiple Notebooks

bash
./scripts/eval_notebook.sh notebook1.ipynb notebook2.ipynb

Returns an array of result objects, one per notebook.

Strict Evaluation

bash
./scripts/eval_notebook.sh analysis.ipynb --fail-fast --timeout 120

Stops immediately on any error with a 2-minute timeout.

6. Output Format

Single Notebook Result

json
{
  "notebook": "/path/to/notebook.ipynb",
  "cwd": "/path/to",
  "kernelspec": "python3",
  "status": "ok",
  "duration_ms": 1234,
  "exec_exception": null,
  "error_count": 0,
  "errors": [],
  "cells": [
    {
      "index": 0,
      "execution_count": 1,
      "source_preview": "print('hello')",
      "outputs": [
        {
          "type": "stream",
          "name": "stdout",
          "text": "hello\n"
        }
      ],
      "output_count": 1
    }
  ]
}

Cell Output Types

Type Fields Description
stream name, text Standard output/error streams
execute_result mime, text Last expression result
display_data mime, text or image_base64_len Rich display (images, HTML)
error ename, evalue, traceback Python exception

7. Exit Codes

Code Meaning
0 Success (notebook executed, may contain errors in results)
1 Script error (invalid arguments, file not found)

Note: Cell execution errors are reported in the JSON output; the script itself succeeds if it can evaluate the notebook.

8. Your Task

When processing evaluation results:

  1. If status=ok: Provide a concise summary of key outputs and execution time.

  2. If status=error:

    • List each error by cell_index with ename, evalue, and relevant traceback lines
    • Identify the most likely root cause
    • Propose the fastest verification step
    • If code changes are needed, describe them precisely
  3. Never overwrite the original notebook file—this skill is read-only by design.

Expand your agent's capabilities with these related and highly-rated skills.

geggo98/dotfiles

nix-shell

Search Nix packages and run commands with packages from nixpkgs that are not installed locally. Use when you need a package not available locally or want to search nixpkgs.

1 0
Explore
geggo98/dotfiles

tmux

Remote control tmux sessions for interactive CLIs (python, gdb, etc.) by sending keystrokes and scraping pane output.

1 0
Explore
geggo98/dotfiles

slidev

Create and present web-based slidedecks for developers using Slidev with Markdown, Vue components, code highlighting, animations, and interactive features. Use when building technical presentations, conference talks, code walkthroughs, teaching materials, or developer decks. Also trigger when the user mentions Slidev, sli.dev, slide decks with code, or wants to create developer-facing presentations.

1 0
Explore
geggo98/dotfiles

diagram-render

Render PlantUML (@startuml…@enduml) and Mermaid fenced blocks to a self-contained HTML preview; if rendering fails, the error text must be embedded in the output image. Use when the user asks to render, preview, or export diagrams.

1 0
Explore
geggo98/dotfiles

adr-writing

Use when documenting significant architectural decisions. Creates focused ADRs explaining context, decision, and alternatives. Prevents vague documentation and implementation detail bloat. Triggers: 'create ADR', 'document decision', making technology/framework/persistence/auth choices, cross-cutting concerns.

1 0
Explore
geggo98/dotfiles

writing-clearly-and-concisely

Use when writing documentation, commit messages, error text, explanations, reports, or summaries. Applies Strunk's principles for clear, vigorous prose. Triggers: writing human-readable content, verbose text, unclear explanations.

1 0
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results