Agent skill

slurm-job-script-generator

Generate correct, copy-pasteable SLURM sbatch job scripts and sanity-check HPC resource requests — configure nodes, MPI tasks, OpenMP threads, memory (per-node or per-cpu), GPUs, walltime, partitions, modules, and environment variables, with automatic detection of conflicting directives and oversubscription. Use when preparing a SLURM submission script, deciding between pure MPI and hybrid MPI+OpenMP layouts, standardizing #SBATCH directives across a team, debugging why a job won't launch or gets killed, or setting up GPU-accelerated simulation jobs, even if the user only says "I need to run this on the cluster" or "my job keeps getting killed."

Stars 29
Forks 2

Install this agent skill to your Project

npx add-skill https://github.com/HeshamFS/materials-simulation-skills/tree/main/skills/hpc-deployment/slurm-job-script-generator

Metadata

Additional technical details for this skill

author
HeshamFS
version
1.1.0
eval cases
2
tested with
[
    "claude-code",
    "gemini-cli",
    "vs-code-copilot"
]
last reviewed
2026-03-26
security tier
high
security reviewed
YES

SKILL.md

SLURM Job Script Generator

Goal

Generate a correct, copy-pasteable SLURM job script (.sbatch) for running a simulation, and surface common configuration mistakes (bad walltime format, conflicting memory flags, oversubscription hints).

Requirements

  • Python 3.8+
  • No external dependencies (Python standard library only)
  • Works on Linux, macOS, and Windows (script generation only)

Inputs to Gather

Input Description Example
Job name Short identifier for the job phasefield-strong-scaling
Walltime SLURM time limit 00:30:00
Partition Cluster partition/queue (if required) compute
Account Project/account (if required) matsim
Nodes Number of nodes to allocate 2
MPI tasks Total tasks, or tasks per node 128 or 64 per node
Threads CPUs per task (OpenMP threads) 2
Memory --mem or --mem-per-cpu (cluster policy dependent) 32G
GPUs GPUs per node (optional) 4
Working directory Where the run should execute $SLURM_SUBMIT_DIR
Modules Environment modules to load (optional) gcc/12, openmpi/4.1
Run command The command to launch under SLURM ./simulate --config cfg.json

Decision Guidance

MPI vs MPI+OpenMP layout

Does the code use OpenMP / threading?
├── NO  → Use MPI-only: cpus-per-task=1
└── YES → Use hybrid: set cpus-per-task = threads per MPI rank
          and export OMP_NUM_THREADS = cpus-per-task

Rule of thumb: if you see diminishing strong-scaling efficiency at high MPI ranks, try fewer ranks with more threads per rank (and measure).

Memory flag selection

  • Use either --mem (per node) or --mem-per-cpu (per CPU), not both.
  • Follow your cluster’s documentation; some sites enforce one style.
  • SLURM --mem units are integer MB by default, or an integer with suffix K/M/G/T (and --mem=0 commonly means “all memory on node”).

Script Outputs (JSON Fields)

Script Key Outputs
scripts/slurm_script_generator.py results.script, results.directives, results.derived, results.warnings

Workflow

  1. Gather cluster constraints (partition/account, GPU policy, memory policy).
  2. Choose a process layout (MPI-only vs hybrid MPI+OpenMP).
  3. Generate the script with slurm_script_generator.py.
  4. Inspect warnings (conflicts, suspicious layouts).
  5. Save the generated script as job.sbatch.
  6. Submit with sbatch job.sbatch and monitor with squeue.

CLI Examples

bash
# Preview a job script (prints to stdout)
python3 skills/hpc-deployment/slurm-job-script-generator/scripts/slurm_script_generator.py \
  --job-name phasefield \
  --time 00:10:00 \
  --partition compute \
  --nodes 1 \
  --ntasks-per-node 8 \
  --cpus-per-task 2 \
  --mem 16G \
  --module gcc/12 \
  --module openmpi/4.1 \
  -- \
  ./simulate --config config.json

# Write to a file and also emit structured JSON
python3 skills/hpc-deployment/slurm-job-script-generator/scripts/slurm_script_generator.py \
  --job-name phasefield \
  --time 00:10:00 \
  --nodes 1 \
  --ntasks 16 \
  --cpus-per-task 1 \
  --out job.sbatch \
  --json \
  -- \
  /bin/echo hello

Conversational Workflow Example

User: I need an sbatch script for my MPI simulation. I want 2 nodes, 64 ranks per node, 2 OpenMP threads per rank, and 2 hours.

Agent workflow:

  1. Confirm partition/account and whether GPUs are needed.
  2. Generate a hybrid job script:
    bash
    python3 scripts/slurm_script_generator.py --job-name run --time 02:00:00 --nodes 2 --ntasks-per-node 64 --cpus-per-task 2 -- -- ./simulate
    
  3. Explain the mapping:
    • Total ranks = 128
    • Threads per rank = 2 (OMP_NUM_THREADS=2)
  4. If the user provides node core counts, sanity-check oversubscription using --cores-per-node.

Error Handling

Error Cause Resolution
time must be HH:MM:SS or D-HH:MM:SS Bad walltime format Use 00:30:00 or 1-00:00:00
nodes must be positive Non-positive nodes Provide --nodes >= 1
Provide either --mem or --mem-per-cpu, not both Conflicting memory directives Choose one memory style
Provide a run command after -- Missing launch command Add -- ./simulate ...

Security

Input Validation

  • --time is validated against strict HH:MM:SS or D-HH:MM:SS format via regex
  • --nodes, --ntasks, --ntasks-per-node, --cpus-per-task, --gpus are validated as positive integers with upper bounds
  • --mem and --mem-per-cpu are validated against SLURM's accepted format (<int>[K|M|G|T]); providing both simultaneously is rejected
  • --job-name is validated against [a-zA-Z0-9_.-]+ (no shell metacharacters)
  • --partition and --account are validated against safe-character allowlists
  • --module values are validated to prevent shell injection (no ;, |, &, backticks, or $)

File Access

  • The script reads no external files; all inputs are provided via CLI arguments
  • --out writes the generated sbatch script to a single specified file path
  • The generated script is a plain-text shell script with #SBATCH directives; it contains no dynamically generated code

Tool Restrictions

  • Read: Used to inspect script source, references, and existing job scripts
  • Bash: Used to execute slurm_script_generator.py with explicit argument lists; the generated script itself is NOT executed by the agent
  • Write: Used to save the generated .sbatch file; writes are scoped to the user's working directory
  • Grep/Glob: Used to locate existing scripts, configs, and cluster documentation

Safety Measures

  • No eval(), exec(), or dynamic code generation
  • All subprocess calls use explicit argument lists (no shell=True)
  • The run command (after --) is included verbatim in the generated script but is never executed by the skill itself
  • Module names are sanitized to prevent injection into module load directives
  • Generated scripts use set -euo pipefail for safe shell execution on the cluster

Limitations

  • Does not query cluster hardware or site policies; it can only validate internal consistency.
  • SLURM installations vary (GPU directives, QoS rules, partitions). Adjust directives for your site.

References

  • references/slurm_directives.md - Common #SBATCH directives and mapping tips

Version History

  • v1.0.0 (2026-02-25): Initial SLURM job script generator

Expand your agent's capabilities with these related and highly-rated skills.

HeshamFS/materials-simulation-skills

post-processing

Extract, analyze, and summarize simulation output data — pull spatial fields at specific timesteps, compute time-series trends and detect steady state, extract line profiles through the domain, generate statistical summaries and distributions, calculate derived quantities (gradients, fluxes, volume fractions, interface area), compare results against analytical solutions or experimental data, and produce automated analysis reports. Use when interpreting finished simulation results, checking mass or energy conservation, comparing two runs or meshes, extracting interface profiles from phase-field output, or preparing publication-quality analysis, even if the user only says "what do my results look like" or "did my simulation reach steady state."

29 2
Explore
HeshamFS/materials-simulation-skills

performance-profiling

Identify computational bottlenecks, analyze parallel scaling, estimate memory requirements, and generate optimization recommendations for materials simulations — parse timing logs to find dominant phases (solver, assembly, I/O), evaluate strong and weak scaling efficiency, profile memory from mesh and field parameters, and detect bottlenecks with actionable fix suggestions. Use when a simulation is running slower than expected, investigating MPI scaling efficiency, planning HPC resource allocation, deciding whether to tune the preconditioner or reduce I/O frequency, or estimating if a problem fits in available RAM, even if the user only says "my simulation is too slow" or "how many nodes do I need."

29 2
Explore
HeshamFS/materials-simulation-skills

parameter-optimization

Explore and optimize simulation parameters via design of experiments (DOE), sensitivity analysis, and optimizer selection — generate Latin Hypercube, quasi-random, or factorial sample plans, rank parameter influence with sensitivity scores, recommend Bayesian optimization, CMA-ES, or gradient- based methods based on dimension and budget, and fit surrogate models for expensive evaluations. Use when calibrating material properties against experimental data, planning a parameter sweep, performing uncertainty quantification, or choosing an optimization strategy for a simulation with a limited evaluation budget, even if the user only says "which parameters matter most" or "how do I calibrate my model."

29 2
Explore
HeshamFS/materials-simulation-skills

simulation-validator

Validate simulations across three stages — run pre-flight checks on configuration files (parameter ranges, required fields, disk space), monitor runtime logs for residual growth, NaN/Inf, and adaptive dt collapse, and perform post-flight validation of results (physical bounds, mass/energy conservation, convergence). Diagnose failed simulations with probable-cause analysis and recommended fixes. Use when preparing to launch a simulation, checking whether a running job is healthy, verifying that finished results are trustworthy, or debugging a crash or blow-up, even if the user only says "my simulation crashed" or "can I trust these results."

29 2
Explore
HeshamFS/materials-simulation-skills

simulation-orchestrator

Orchestrate multi-simulation campaigns — generate parameter sweep configurations (grid, linspace, or Latin Hypercube sampling), initialize and track batch job campaigns, monitor job completion status, and aggregate results with summary statistics across all runs. Use when running a parameter study across dt, kappa, or other simulation inputs, managing dozens or hundreds of simulation configurations, combining outputs from completed batch runs to find the best result, or automating the generate-run-collect workflow for systematic studies, even if the user only says "I need to try many parameter combinations" or "how do I organize a sweep."

29 2
Explore
HeshamFS/materials-simulation-skills

ontology-explorer

Parse, navigate, and query materials science ontology structures — browse class hierarchies, inspect individual classes and their properties, look up object and data property definitions with domain/range, search for ontology terms by keyword, and parse or summarize raw OWL/XML files. Supports the OCDO ecosystem (CMSO, ASMO, CDCO, PODO, PLDO, LDO). Use when exploring what classes or properties an ontology provides, finding the right CMSO term for a crystal structure or simulation concept, understanding parent-child class relationships, or onboarding to an unfamiliar materials ontology, even if the user only says "what ontology terms describe my FCC copper simulation" or "show me the CMSO class hierarchy."

29 2
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results