Running Jobs With SLURM

The CRC uses SLURM Workload Manager as a job submission and scheduling interface for users to run their computational workloads on our resources.   SLURM has many configurable parameters that we can adjust as admins to better ensure fair access to our resources, as well as options for users to better specify the requirements of their jobs.

A "Job" is an allocation of compute resources assigned to your CRC user account for some amount of time by the Slurm Workload Manager. Once allocated, you can use those resources to run commands and process data. 

A "Batch Job" is a type of job that is specified fully by a .slurm submission script. This script contains information about the resources you need and the commands to be run once those resources are allocated to you. As opposed to interactive sessions, batch jobs run unattended. Once queued, your job you can use slurm commands like squeue and sacct to check the job's status.

Submitting a Batch Job

When you access the clusters, you will start on a log in node. Your data processing should not be run here.

[nlc60@login0b ~] : hostname
login0b.htc.sam.pitt.edu

Use a batch job to recieve an allocation of compute resources and have your commands run there. 

sbatch is the slurm function to submit a script or .slurm submission script as a batch job. Here is a simple example, submitting a bash script as a batch job.

[nlc60@login0b ~] : cat hello_world.sh
#!/bin/bash
echo "hello world"
crc-job-stats

[nlc60@login0b ~] : sbatch hello_world.sh
Submitted batch job 946479

[nlc60@login0b ~] : cat slurm-946479.out
hello world
============================================================================
                               JOB STATISTICS
============================================================================
      SubmitTime: 2022-06-17T12:41:39
         EndTime: 2022-06-17T13:41:39
         RunTime: 00:00:00
           JobId: 946479
            TRES: cpu=1,mem=8000M,node=1,billing=1
       Partition: htc
        NodeList: htc-n0
         Command: /ihome/sam/nlc60/hello_world.sh
          StdOut: /ihome/sam/nlc60/slurm-946479.out
More information:
    - `sacct -M htc -j 946479 -S 2022-06-17T12:41:39 -E 2022-06-17T13:41:39`
   Print control:
    - List of all possible fields: `sacct --helpformat`
    - Add `--format=<field1,field2,etc>` with fields of interest
============================================================================

Note that there is a call to the 'crc-job-stats' wrapper function. It is always a useful addition to a script submitted to slurm, as it will show you how to find more information if something goes wrong. 

 

Specifying Job Requirements 

You can add more detail to the requirements of your job by providing sbatch arguments. For example:

[nlc60@login0b ~] : sbatch --nodes=1 --time=0-00:1:00 --ntasks-per-node=1 --qos=short --cluster=smp --partition=smp hello_world.sh

The benefit of using a .slurm submission script is that these arguments are more easily read and reproduced.

Converting the above example from hello_world.sh to hello_world.slurm and running:

[nlc60@login0b ~] : cat hello_world.slurm
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --time=0-00:01:00
#SBATCH --ntasks-per-node=1
#SBATCH --qos=short
#SBATCH --cluster=smp
#SBATCH --partition=smp
echo "hello world"
crc-job-stats

​The slurm submission script defines the additional arguments as lines starting with '#SBATCH'. 

sbatch will stop processing further arguments once the first non-comment, non-whitespace line has been reached in the script. 

See our User Manual page on SLURM batch jobs for a more detailed list of common SBATCH directives.

 

Running the .slurm Submission Script

[nlc60@login0b ~] : sbatch hello_world.slurm
Submitted batch job 6041483 on cluster smp

[nlc60@login0b ~] : cat slurm-6041483.out
hello world
============================================================================
                               JOB STATISTICS
============================================================================
      SubmitTime: 2022-06-17T13:19:29
         EndTime: 2022-06-17T13:20:30
         RunTime: 00:00:00
           JobId: 6041483
            TRES: cpu=1,mem=4018M,node=1
       Partition: smp
        NodeList: smp-n26
         Command: /ihome/sam/nlc60/hello_world.slurm
          StdOut: /ihome/sam/nlc60/slurm-6041483.out
More information:
    - `sacct -M smp -j 6041483 -S 2022-06-17T13:19:29 -E 2022-06-17T13:20:30`
   Print control:
    - List of all possible fields: `sacct --helpformat`
    - Add `--format=<field1,field2,etc>` with fields of interest
============================================================================

 

GPU Jobs

If you want to submit a GPU job, you only need to make some minor changes to the above script, i.e. changing the cluster and partition names a well as specifying the number of requested GPUs:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --time=0-00:01:00
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:1
#SBATCH --cluster=gpu
#SBATCH --partition=a100

USER_SPECIFC COMMAND FOR GPU CODE TO BE EXECUTED