Running Jobs With SLURM
The CRC uses SLURM Workload Manager as a job submission and scheduling interface for users to run their computational workloads on our resources. SLURM has many configurable parameters that we can adjust as admins to better ensure fair access to our resources, as well as options for users to better specify the requirements of their jobs.
A "Job" is an allocation of compute resources assigned to your CRC user account for some amount of time by the Slurm Workload Manager. Once allocated, you can use those resources to run commands and process data.
A "Batch Job" is a type of job that is specified fully by a .slurm submission script. This script contains information about the resources you need and the commands to be run once those resources are allocated to you. As opposed to interactive sessions, batch jobs run unattended. Once queued, your job you can use slurm commands like squeue and sacct to check the job's status.
Submitting a Batch Job
When you access the clusters, you will start on a log in node. Your data processing should not be run here.
[nlc60@login0b ~] : hostname login0b.htc.sam.pitt.edu
Use a batch job to recieve an allocation of compute resources and have your commands run there.
sbatch is the slurm function to submit a script or .slurm submission script as a batch job. Here is a simple example, submitting a bash script as a batch job.
[nlc60@login0b ~] : cat hello_world.sh #!/bin/bash echo "hello world" crc-job-stats [nlc60@login0b ~] : sbatch hello_world.sh Submitted batch job 946479 [nlc60@login0b ~] : cat slurm-946479.out hello world ============================================================================ JOB STATISTICS ============================================================================ SubmitTime: 2022-06-17T12:41:39 EndTime: 2022-06-17T13:41:39 RunTime: 00:00:00 JobId: 946479 TRES: cpu=1,mem=8000M,node=1,billing=1 Partition: htc NodeList: htc-n0 Command: /ihome/sam/nlc60/hello_world.sh StdOut: /ihome/sam/nlc60/slurm-946479.out More information: - `sacct -M htc -j 946479 -S 2022-06-17T12:41:39 -E 2022-06-17T13:41:39` Print control: - List of all possible fields: `sacct --helpformat` - Add `--format=<field1,field2,etc>` with fields of interest ============================================================================
Note that there is a call to the 'crc-job-stats' wrapper function. It is always a useful addition to a script submitted to slurm, as it will show you how to find more information if something goes wrong.
Specifying Job Requirements
You can add more detail to the requirements of your job by providing sbatch arguments. For example:
[nlc60@login0b ~] : sbatch --nodes=1 --time=0-00:1:00 --ntasks-per-node=1 --qos=short --cluster=smp --partition=smp hello_world.sh
The benefit of using a .slurm submission script is that these arguments are more easily read and reproduced.
Converting the above example from hello_world.sh to hello_world.slurm and running:
[nlc60@login0b ~] : cat hello_world.slurm #!/bin/bash #SBATCH --nodes=1 #SBATCH --time=0-00:01:00 #SBATCH --ntasks-per-node=1 #SBATCH --qos=short #SBATCH --cluster=smp #SBATCH --partition=smp
echo "hello world" crc-job-stats
The slurm submission script defines the additional arguments as lines starting with '#SBATCH'.
sbatch will stop processing further arguments once the first non-comment, non-whitespace line has been reached in the script.
See our User Manual page on SLURM batch jobs for a more detailed list of common SBATCH directives.
Running the .slurm Submission Script
[nlc60@login0b ~] : sbatch hello_world.slurm Submitted batch job 6041483 on cluster smp [nlc60@login0b ~] : cat slurm-6041483.out hello world ============================================================================ JOB STATISTICS ============================================================================ SubmitTime: 2022-06-17T13:19:29 EndTime: 2022-06-17T13:20:30 RunTime: 00:00:00 JobId: 6041483 TRES: cpu=1,mem=4018M,node=1 Partition: smp NodeList: smp-n26 Command: /ihome/sam/nlc60/hello_world.slurm StdOut: /ihome/sam/nlc60/slurm-6041483.out More information: - `sacct -M smp -j 6041483 -S 2022-06-17T13:19:29 -E 2022-06-17T13:20:30` Print control: - List of all possible fields: `sacct --helpformat` - Add `--format=<field1,field2,etc>` with fields of interest ============================================================================
GPU Jobs
If you want to submit a GPU job, you only need to make some minor changes to the above script, i.e. changing the cluster and partition names a well as specifying the number of requested GPUs:
#!/bin/bash #SBATCH --nodes=1 #SBATCH --time=0-00:01:00 #SBATCH --ntasks-per-node=1 #SBATCH --gres=gpu:1 #SBATCH --cluster=gpu #SBATCH --partition=a100 USER_SPECIFC COMMAND FOR GPU CODE TO BE EXECUTED