Agent skills
slurm-job-script-generator

Agent skill

slurm-job-script-generator

Generate correct, copy-pasteable SLURM sbatch job scripts and sanity-check HPC resource requests — configure nodes, MPI tasks, OpenMP threads, memory (per-node or per-cpu), GPUs, walltime, partitions, modules, and environment variables, with automatic detection of conflicting directives and oversubscription. Use when preparing a SLURM submission script, deciding between pure MPI and hybrid MPI+OpenMP layouts, standardizing #SBATCH directives across a team, debugging why a job won't launch or gets killed, or setting up GPU-accelerated simulation jobs, even if the user only says "I need to run this on the cluster" or "my job keeps getting killed."

View SKILL.md on GitHub Repository

Stars 29

Forks 2

Install this agent skill to your Project

npx add-skill https://github.com/HeshamFS/materials-simulation-skills/tree/main/skills/hpc-deployment/slurm-job-script-generator

Metadata

Additional technical details for this skill

author: HeshamFS
version: 1.1.0
eval cases: 2
tested with: [ "claude-code", "gemini-cli", "vs-code-copilot" ]
last reviewed: 2026-03-26
security tier: high
security reviewed: YES

SKILL.md

SLURM Job Script Generator

Goal

Generate a correct, copy-pasteable SLURM job script (.sbatch) for running a simulation, and surface common configuration mistakes (bad walltime format, conflicting memory flags, oversubscription hints).

Requirements

Python 3.8+
No external dependencies (Python standard library only)
Works on Linux, macOS, and Windows (script generation only)

Inputs to Gather

Input	Description	Example
Job name	Short identifier for the job	`phasefield-strong-scaling`
Walltime	SLURM time limit	`00:30:00`
Partition	Cluster partition/queue (if required)	`compute`
Account	Project/account (if required)	`matsim`
Nodes	Number of nodes to allocate	`2`
MPI tasks	Total tasks, or tasks per node	`128` or `64` per node
Threads	CPUs per task (OpenMP threads)	`2`
Memory	`--mem` or `--mem-per-cpu` (cluster policy dependent)	`32G`
GPUs	GPUs per node (optional)	`4`
Working directory	Where the run should execute	`$SLURM_SUBMIT_DIR`
Modules	Environment modules to load (optional)	`gcc/12`, `openmpi/4.1`
Run command	The command to launch under SLURM	`./simulate --config cfg.json`

Decision Guidance

MPI vs MPI+OpenMP layout

Does the code use OpenMP / threading?
├── NO  → Use MPI-only: cpus-per-task=1
└── YES → Use hybrid: set cpus-per-task = threads per MPI rank
          and export OMP_NUM_THREADS = cpus-per-task

Rule of thumb: if you see diminishing strong-scaling efficiency at high MPI ranks, try fewer ranks with more threads per rank (and measure).

Memory flag selection

Use either --mem (per node) or --mem-per-cpu (per CPU), not both.
Follow your cluster’s documentation; some sites enforce one style.
SLURM --mem units are integer MB by default, or an integer with suffix K/M/G/T (and --mem=0 commonly means “all memory on node”).

Script Outputs (JSON Fields)

Script	Key Outputs
`scripts/slurm_script_generator.py`	`results.script`, `results.directives`, `results.derived`, `results.warnings`

Workflow

Gather cluster constraints (partition/account, GPU policy, memory policy).
Choose a process layout (MPI-only vs hybrid MPI+OpenMP).
Generate the script with slurm_script_generator.py.
Inspect warnings (conflicts, suspicious layouts).
Save the generated script as job.sbatch.
Submit with sbatch job.sbatch and monitor with squeue.

CLI Examples

bash

# Preview a job script (prints to stdout)
python3 skills/hpc-deployment/slurm-job-script-generator/scripts/slurm_script_generator.py \
  --job-name phasefield \
  --time 00:10:00 \
  --partition compute \
  --nodes 1 \
  --ntasks-per-node 8 \
  --cpus-per-task 2 \
  --mem 16G \
  --module gcc/12 \
  --module openmpi/4.1 \
  -- \
  ./simulate --config config.json

# Write to a file and also emit structured JSON
python3 skills/hpc-deployment/slurm-job-script-generator/scripts/slurm_script_generator.py \
  --job-name phasefield \
  --time 00:10:00 \
  --nodes 1 \
  --ntasks 16 \
  --cpus-per-task 1 \
  --out job.sbatch \
  --json \
  -- \
  /bin/echo hello

Conversational Workflow Example

User: I need an sbatch script for my MPI simulation. I want 2 nodes, 64 ranks per node, 2 OpenMP threads per rank, and 2 hours.

Agent workflow:

Confirm partition/account and whether GPUs are needed.

Generate a hybrid job script:

bash

python3 scripts/slurm_script_generator.py --job-name run --time 02:00:00 --nodes 2 --ntasks-per-node 64 --cpus-per-task 2 -- -- ./simulate

Explain the mapping:
- Total ranks = 128
- Threads per rank = 2 (OMP_NUM_THREADS=2)
If the user provides node core counts, sanity-check oversubscription using --cores-per-node.

Error Handling

Error	Cause	Resolution
`time must be HH:MM:SS or D-HH:MM:SS`	Bad walltime format	Use `00:30:00` or `1-00:00:00`
`nodes must be positive`	Non-positive nodes	Provide `--nodes >= 1`
`Provide either --mem or --mem-per-cpu, not both`	Conflicting memory directives	Choose one memory style
`Provide a run command after --`	Missing launch command	Add `-- ./simulate ...`

Security

Input Validation

--time is validated against strict HH:MM:SS or D-HH:MM:SS format via regex
--nodes, --ntasks, --ntasks-per-node, --cpus-per-task, --gpus are validated as positive integers with upper bounds
--mem and --mem-per-cpu are validated against SLURM's accepted format (<int>[K|M|G|T]); providing both simultaneously is rejected
--job-name is validated against [a-zA-Z0-9_.-]+ (no shell metacharacters)
--partition and --account are validated against safe-character allowlists
--module values are validated to prevent shell injection (no ;, |, &, backticks, or $)

File Access

The script reads no external files; all inputs are provided via CLI arguments
--out writes the generated sbatch script to a single specified file path
The generated script is a plain-text shell script with #SBATCH directives; it contains no dynamically generated code

Tool Restrictions

Read: Used to inspect script source, references, and existing job scripts
Bash: Used to execute slurm_script_generator.py with explicit argument lists; the generated script itself is NOT executed by the agent
Write: Used to save the generated .sbatch file; writes are scoped to the user's working directory
Grep/Glob: Used to locate existing scripts, configs, and cluster documentation

Safety Measures

No eval(), exec(), or dynamic code generation
All subprocess calls use explicit argument lists (no shell=True)
The run command (after --) is included verbatim in the generated script but is never executed by the skill itself
Module names are sanitized to prevent injection into module load directives
Generated scripts use set -euo pipefail for safe shell execution on the cluster

Limitations

Does not query cluster hardware or site policies; it can only validate internal consistency.
SLURM installations vary (GPU directives, QoS rules, partitions). Adjust directives for your site.

References

references/slurm_directives.md - Common #SBATCH directives and mapping tips

Version History

v1.0.0 (2026-02-25): Initial SLURM job script generator

Maintainer

HeshamFS Core maintainer

Source details

Full Name: HeshamFS/materials-simulation-skills
Branch: main
Path in repo: skills/hpc-deployment/slurm-job-script-generator
License: Apache License 2.0
Topics: agent-skills cli-tools skills agents llm materials-science computational-science numerical-methods simulation

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

HeshamFS/materials-simulation-skills

post-processing

Extract, analyze, and summarize simulation output data — pull spatial fields at specific timesteps, compute time-series trends and detect steady state, extract line profiles through the domain, generate statistical summaries and distributions, calculate derived quantities (gradients, fluxes, volume fractions, interface area), compare results against analytical solutions or experimental data, and produce automated analysis reports. Use when interpreting finished simulation results, checking mass or energy conservation, comparing two runs or meshes, extracting interface profiles from phase-field output, or preparing publication-quality analysis, even if the user only says "what do my results look like" or "did my simulation reach steady state."

29 2

Explore

HeshamFS/materials-simulation-skills

performance-profiling

Identify computational bottlenecks, analyze parallel scaling, estimate memory requirements, and generate optimization recommendations for materials simulations — parse timing logs to find dominant phases (solver, assembly, I/O), evaluate strong and weak scaling efficiency, profile memory from mesh and field parameters, and detect bottlenecks with actionable fix suggestions. Use when a simulation is running slower than expected, investigating MPI scaling efficiency, planning HPC resource allocation, deciding whether to tune the preconditioner or reduce I/O frequency, or estimating if a problem fits in available RAM, even if the user only says "my simulation is too slow" or "how many nodes do I need."

29 2

Explore

HeshamFS/materials-simulation-skills

parameter-optimization

Explore and optimize simulation parameters via design of experiments (DOE), sensitivity analysis, and optimizer selection — generate Latin Hypercube, quasi-random, or factorial sample plans, rank parameter influence with sensitivity scores, recommend Bayesian optimization, CMA-ES, or gradient- based methods based on dimension and budget, and fit surrogate models for expensive evaluations. Use when calibrating material properties against experimental data, planning a parameter sweep, performing uncertainty quantification, or choosing an optimization strategy for a simulation with a limited evaluation budget, even if the user only says "which parameters matter most" or "how do I calibrate my model."

29 2

Explore

HeshamFS/materials-simulation-skills

simulation-validator

Validate simulations across three stages — run pre-flight checks on configuration files (parameter ranges, required fields, disk space), monitor runtime logs for residual growth, NaN/Inf, and adaptive dt collapse, and perform post-flight validation of results (physical bounds, mass/energy conservation, convergence). Diagnose failed simulations with probable-cause analysis and recommended fixes. Use when preparing to launch a simulation, checking whether a running job is healthy, verifying that finished results are trustworthy, or debugging a crash or blow-up, even if the user only says "my simulation crashed" or "can I trust these results."

29 2

Explore

HeshamFS/materials-simulation-skills

simulation-orchestrator

Orchestrate multi-simulation campaigns — generate parameter sweep configurations (grid, linspace, or Latin Hypercube sampling), initialize and track batch job campaigns, monitor job completion status, and aggregate results with summary statistics across all runs. Use when running a parameter study across dt, kappa, or other simulation inputs, managing dozens or hundreds of simulation configurations, combining outputs from completed batch runs to find the best result, or automating the generate-run-collect workflow for systematic studies, even if the user only says "I need to try many parameter combinations" or "how do I organize a sweep."

29 2

Explore

HeshamFS/materials-simulation-skills

ontology-explorer

Parse, navigate, and query materials science ontology structures — browse class hierarchies, inspect individual classes and their properties, look up object and data property definitions with domain/range, search for ontology terms by keyword, and parse or summarize raw OWL/XML files. Supports the OCDO ecosystem (CMSO, ASMO, CDCO, PODO, PLDO, LDO). Use when exploring what classes or properties an ontology provides, finding the right CMSO term for a crystal structure or simulation concept, understanding parent-child class relationships, or onboarding to an unfamiliar materials ontology, even if the user only says "what ontology terms describe my FCC copper simulation" or "show me the CMSO class hierarchy."

29 2

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

Metadata

SKILL.md

SLURM Job Script Generator

Goal

Requirements

Inputs to Gather

Decision Guidance

MPI vs MPI+OpenMP layout

Memory flag selection

Script Outputs (JSON Fields)

Workflow

CLI Examples

Conversational Workflow Example

Error Handling

Security

Input Validation

File Access

Tool Restrictions

Safety Measures

Limitations

References

Version History

Recommended Agent Skills

post-processing

performance-profiling

parameter-optimization

simulation-validator

simulation-orchestrator

ontology-explorer