Agent skill
slurm-job-script-generator
Generate correct, copy-pasteable SLURM sbatch job scripts and sanity-check HPC resource requests — configure nodes, MPI tasks, OpenMP threads, memory (per-node or per-cpu), GPUs, walltime, partitions, modules, and environment variables, with automatic detection of conflicting directives and oversubscription. Use when preparing a SLURM submission script, deciding between pure MPI and hybrid MPI+OpenMP layouts, standardizing #SBATCH directives across a team, debugging why a job won't launch or gets killed, or setting up GPU-accelerated simulation jobs, even if the user only says "I need to run this on the cluster" or "my job keeps getting killed."
Install this agent skill to your Project
npx add-skill https://github.com/HeshamFS/materials-simulation-skills/tree/main/skills/hpc-deployment/slurm-job-script-generator
Metadata
Additional technical details for this skill
- author
- HeshamFS
- version
- 1.1.0
- eval cases
- 2
- tested with
-
[ "claude-code", "gemini-cli", "vs-code-copilot" ] - last reviewed
- 2026-03-26
- security tier
- high
- security reviewed
- YES
SKILL.md
SLURM Job Script Generator
Goal
Generate a correct, copy-pasteable SLURM job script (.sbatch) for running a simulation, and surface common configuration mistakes (bad walltime format, conflicting memory flags, oversubscription hints).
Requirements
- Python 3.8+
- No external dependencies (Python standard library only)
- Works on Linux, macOS, and Windows (script generation only)
Inputs to Gather
| Input | Description | Example |
|---|---|---|
| Job name | Short identifier for the job | phasefield-strong-scaling |
| Walltime | SLURM time limit | 00:30:00 |
| Partition | Cluster partition/queue (if required) | compute |
| Account | Project/account (if required) | matsim |
| Nodes | Number of nodes to allocate | 2 |
| MPI tasks | Total tasks, or tasks per node | 128 or 64 per node |
| Threads | CPUs per task (OpenMP threads) | 2 |
| Memory | --mem or --mem-per-cpu (cluster policy dependent) |
32G |
| GPUs | GPUs per node (optional) | 4 |
| Working directory | Where the run should execute | $SLURM_SUBMIT_DIR |
| Modules | Environment modules to load (optional) | gcc/12, openmpi/4.1 |
| Run command | The command to launch under SLURM | ./simulate --config cfg.json |
Decision Guidance
MPI vs MPI+OpenMP layout
Does the code use OpenMP / threading?
├── NO → Use MPI-only: cpus-per-task=1
└── YES → Use hybrid: set cpus-per-task = threads per MPI rank
and export OMP_NUM_THREADS = cpus-per-task
Rule of thumb: if you see diminishing strong-scaling efficiency at high MPI ranks, try fewer ranks with more threads per rank (and measure).
Memory flag selection
- Use either
--mem(per node) or--mem-per-cpu(per CPU), not both. - Follow your cluster’s documentation; some sites enforce one style.
- SLURM
--memunits are integer MB by default, or an integer with suffixK/M/G/T(and--mem=0commonly means “all memory on node”).
Script Outputs (JSON Fields)
| Script | Key Outputs |
|---|---|
scripts/slurm_script_generator.py |
results.script, results.directives, results.derived, results.warnings |
Workflow
- Gather cluster constraints (partition/account, GPU policy, memory policy).
- Choose a process layout (MPI-only vs hybrid MPI+OpenMP).
- Generate the script with
slurm_script_generator.py. - Inspect warnings (conflicts, suspicious layouts).
- Save the generated script as
job.sbatch. - Submit with
sbatch job.sbatchand monitor withsqueue.
CLI Examples
# Preview a job script (prints to stdout)
python3 skills/hpc-deployment/slurm-job-script-generator/scripts/slurm_script_generator.py \
--job-name phasefield \
--time 00:10:00 \
--partition compute \
--nodes 1 \
--ntasks-per-node 8 \
--cpus-per-task 2 \
--mem 16G \
--module gcc/12 \
--module openmpi/4.1 \
-- \
./simulate --config config.json
# Write to a file and also emit structured JSON
python3 skills/hpc-deployment/slurm-job-script-generator/scripts/slurm_script_generator.py \
--job-name phasefield \
--time 00:10:00 \
--nodes 1 \
--ntasks 16 \
--cpus-per-task 1 \
--out job.sbatch \
--json \
-- \
/bin/echo hello
Conversational Workflow Example
User: I need an sbatch script for my MPI simulation. I want 2 nodes, 64 ranks per node, 2 OpenMP threads per rank, and 2 hours.
Agent workflow:
- Confirm partition/account and whether GPUs are needed.
- Generate a hybrid job script:
bash
python3 scripts/slurm_script_generator.py --job-name run --time 02:00:00 --nodes 2 --ntasks-per-node 64 --cpus-per-task 2 -- -- ./simulate - Explain the mapping:
- Total ranks = 128
- Threads per rank = 2 (
OMP_NUM_THREADS=2)
- If the user provides node core counts, sanity-check oversubscription using
--cores-per-node.
Error Handling
| Error | Cause | Resolution |
|---|---|---|
time must be HH:MM:SS or D-HH:MM:SS |
Bad walltime format | Use 00:30:00 or 1-00:00:00 |
nodes must be positive |
Non-positive nodes | Provide --nodes >= 1 |
Provide either --mem or --mem-per-cpu, not both |
Conflicting memory directives | Choose one memory style |
Provide a run command after -- |
Missing launch command | Add -- ./simulate ... |
Security
Input Validation
--timeis validated against strictHH:MM:SSorD-HH:MM:SSformat via regex--nodes,--ntasks,--ntasks-per-node,--cpus-per-task,--gpusare validated as positive integers with upper bounds--memand--mem-per-cpuare validated against SLURM's accepted format (<int>[K|M|G|T]); providing both simultaneously is rejected--job-nameis validated against[a-zA-Z0-9_.-]+(no shell metacharacters)--partitionand--accountare validated against safe-character allowlists--modulevalues are validated to prevent shell injection (no;,|,&, backticks, or$)
File Access
- The script reads no external files; all inputs are provided via CLI arguments
--outwrites the generated sbatch script to a single specified file path- The generated script is a plain-text shell script with
#SBATCHdirectives; it contains no dynamically generated code
Tool Restrictions
- Read: Used to inspect script source, references, and existing job scripts
- Bash: Used to execute
slurm_script_generator.pywith explicit argument lists; the generated script itself is NOT executed by the agent - Write: Used to save the generated
.sbatchfile; writes are scoped to the user's working directory - Grep/Glob: Used to locate existing scripts, configs, and cluster documentation
Safety Measures
- No
eval(),exec(), or dynamic code generation - All subprocess calls use explicit argument lists (no
shell=True) - The run command (after
--) is included verbatim in the generated script but is never executed by the skill itself - Module names are sanitized to prevent injection into
module loaddirectives - Generated scripts use
set -euo pipefailfor safe shell execution on the cluster
Limitations
- Does not query cluster hardware or site policies; it can only validate internal consistency.
- SLURM installations vary (GPU directives, QoS rules, partitions). Adjust directives for your site.
References
references/slurm_directives.md- Common#SBATCHdirectives and mapping tips
Version History
- v1.0.0 (2026-02-25): Initial SLURM job script generator
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
post-processing
Extract, analyze, and summarize simulation output data — pull spatial fields at specific timesteps, compute time-series trends and detect steady state, extract line profiles through the domain, generate statistical summaries and distributions, calculate derived quantities (gradients, fluxes, volume fractions, interface area), compare results against analytical solutions or experimental data, and produce automated analysis reports. Use when interpreting finished simulation results, checking mass or energy conservation, comparing two runs or meshes, extracting interface profiles from phase-field output, or preparing publication-quality analysis, even if the user only says "what do my results look like" or "did my simulation reach steady state."
performance-profiling
Identify computational bottlenecks, analyze parallel scaling, estimate memory requirements, and generate optimization recommendations for materials simulations — parse timing logs to find dominant phases (solver, assembly, I/O), evaluate strong and weak scaling efficiency, profile memory from mesh and field parameters, and detect bottlenecks with actionable fix suggestions. Use when a simulation is running slower than expected, investigating MPI scaling efficiency, planning HPC resource allocation, deciding whether to tune the preconditioner or reduce I/O frequency, or estimating if a problem fits in available RAM, even if the user only says "my simulation is too slow" or "how many nodes do I need."
parameter-optimization
Explore and optimize simulation parameters via design of experiments (DOE), sensitivity analysis, and optimizer selection — generate Latin Hypercube, quasi-random, or factorial sample plans, rank parameter influence with sensitivity scores, recommend Bayesian optimization, CMA-ES, or gradient- based methods based on dimension and budget, and fit surrogate models for expensive evaluations. Use when calibrating material properties against experimental data, planning a parameter sweep, performing uncertainty quantification, or choosing an optimization strategy for a simulation with a limited evaluation budget, even if the user only says "which parameters matter most" or "how do I calibrate my model."
simulation-validator
Validate simulations across three stages — run pre-flight checks on configuration files (parameter ranges, required fields, disk space), monitor runtime logs for residual growth, NaN/Inf, and adaptive dt collapse, and perform post-flight validation of results (physical bounds, mass/energy conservation, convergence). Diagnose failed simulations with probable-cause analysis and recommended fixes. Use when preparing to launch a simulation, checking whether a running job is healthy, verifying that finished results are trustworthy, or debugging a crash or blow-up, even if the user only says "my simulation crashed" or "can I trust these results."
simulation-orchestrator
Orchestrate multi-simulation campaigns — generate parameter sweep configurations (grid, linspace, or Latin Hypercube sampling), initialize and track batch job campaigns, monitor job completion status, and aggregate results with summary statistics across all runs. Use when running a parameter study across dt, kappa, or other simulation inputs, managing dozens or hundreds of simulation configurations, combining outputs from completed batch runs to find the best result, or automating the generate-run-collect workflow for systematic studies, even if the user only says "I need to try many parameter combinations" or "how do I organize a sweep."
ontology-explorer
Parse, navigate, and query materials science ontology structures — browse class hierarchies, inspect individual classes and their properties, look up object and data property definitions with domain/range, search for ontology terms by keyword, and parse or summarize raw OWL/XML files. Supports the OCDO ecosystem (CMSO, ASMO, CDCO, PODO, PLDO, LDO). Use when exploring what classes or properties an ontology provides, finding the right CMSO term for a crystal structure or simulation concept, understanding parent-child class relationships, or onboarding to an unfamiliar materials ontology, even if the user only says "what ontology terms describe my FCC copper simulation" or "show me the CMSO class hierarchy."
Didn't find tool you were looking for?