HPC Job Templates
Mycelia now supports a single canonical JobSpec that renders to multiple execution backends:
- SLURM batch scripts (
render_sbatch) - SLURM interactive allocation commands (
render_salloc) - Docker scripts (
render_docker_run) - Cloud Build YAML (
render_cloudbuild)
Why This Exists
The goal is one validated job definition that can target:
- NERSC Perlmutter
- LBL Lawrencium
- Stanford SCG
- local Docker
- Google Cloud Build
Quick Example
job = Mycelia.JobSpec(
job_name="example-job",
cmd="julia pipeline.jl",
site=:scg,
partition="nih_s10",
account="PI_SUNetID",
time_limit="24:00:00",
nodes=1,
cpus_per_task=12,
mem_gb=96
)
Mycelia.submit(job; dry_run=true)NERSC Notes
Login Node Limits
NERSC login nodes are constrained by cgroups; do not run heavy compute directly on login nodes.
- Memory limit per user: 56 GB
- Aggregate CPU throttling: 25%
Use SLURM allocations for substantial compute.
Constraint + GPU Request Rules
- CPU nodes:
constraint="cpu" - GPU nodes:
constraint="gpu" - Large-memory GPU nodes:
constraint="gpu&hbm80g" - Standard-memory GPU nodes can use
constraint="gpu&hbm40g"
If you request a GPU constraint, you must also request GPUs (gpus_per_node or gpus_per_task).
Interactive Jobs
Use qos="interactive" and render with render_salloc.
cmd = Mycelia.render_salloc(job)For gpu&hbm80g, command rendering quotes the constraint correctly (-C "gpu&hbm80g").
Monitoring
Use these commands to observe jobs and resource usage:
squeue -u $USERscontrol show job <jobid>sacct -j <jobid> --format=JobID,JobName,AllocCPUS,Elapsed,State,ExitCode,MaxRSSsstat -j <jobid>.batch --format=JobID,MaxRSS,MaxVMSize,AvgCPUseff <jobid>(if available)nvidia-sminvidia-smi topo -m
Mycelia.summarize_job(jobid) provides a best-effort aggregated view.
Lawrencium Notes
Lawrencium requires both partition and account. QoS must match your allowed associations.
Helpers:
Mycelia.list_lawrencium_associations()
Mycelia.list_lawrencium_qos_limits()Association/QoS parsing is best-effort and produces warnings instead of hard failures when lookup is unavailable.
SCG Notes
Supported partitions:
batchinteractivenih_s10
Rules encoded by validator:
nodes=1expected for SCGbatchandnih_s10requireaccount(PI_SUNetIDformat)time_limitrequired
Container Portability
Docker
docker_script = Mycelia.render_docker_run(job)Cloud Build
yaml = Mycelia.render_cloudbuild(job)Cloud Build rendering supports either:
- build + push from local
Dockerfile, or - prebuilt container image (
container_image)
Cloud Build is CPU-oriented unless additional GPU infrastructure is configured separately.
HPC Containers
When container_image is supplied, SLURM renderers can execute via site-appropriate engines (defaults vary by site, override via container_engine).