Slurm jobs

The three most important commands in Slurm are sbatch, srun and scancel. sbatch is used to submit a job script to the queue like the one below, called example.sbatch srun is used to run parallel jobs on compute nodes. Jobs can be canceled with scancel.

#!/bin/bash
#
#SBATCH -N 1 # Ensure that all cores are on one machine
#SBATCH -t 0-01:00 # Runtime in D-HH:MM
 
#SBATCH --cpus-per-task=4 # Request that ncpus be allocated per process.
#SBATCH --mem=10g # Memory pool for all cores (see also --mem-per-cpu)
 
# This job requires 4 CPUs (4 CPUs per task). Allocate 4 CPUs from 1 node in the default partition.
 
# Change to the directory that the script was launched from. This is the default for SLURM.
 
module load hisat2/2.1.0
 
hisat2-build ./reference/22_20-21M.fa 22_20-21M_hisat
hisat2 -p $SLURM_CPUS_PER_TASK -x 22_20-21M_hisat -U ./reads/reads_1.fq -S eg1.sam
hisat2 -p $SLURM_CPUS_PER_TASK -x 22_20-21M_hisat -1 ./reads/reads_1.fq -2 ./reads/reads_2.fq -S eg2.sam
  • NOTE: requests for walltime extensions will not be granted

This is an example job script to run hisat examples. To run this script, copy the hisat example folder as

cp -r /ihome/sam/apps/HISAT/hisat-0.1.6-beta/example.
cd example

and generate text file named example.sbatch with the contents like the one above.

This job is submitted with the command sbatch example.sbatch By default the standard out is redirected to slurm-<jobid>.out.

[fangping@login0a example]$ sbatch example.sbatch
Submitted batch job 389675
[fangping@login0a example]$ head slurm-389675.out
Settings:
  Output files: "22_20-21M_hisat.*.ht2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Local offset rate: 3 (one in 8)
  Local fTable chars: 6
  Local sequence length: 57344
  • Note: By default the working directory of your job is the directory from which the batch script was submitted. See below for more information about job environments.

The sbatch arguments here are the minimal subset required to accurately specify a job on the htc cluster. Please refer to man sbatch for more options.

sbatch argument Description
-N --nodes Maximum number of nodes to be used by each Job Step.
--tasks-per-node Specify the number of tasks to be launched per node..
--cpus-per-task Advise the SLURM controller that ensuing job steps will require ncpus number of processors per task.
-e --error File to redirect standard error.
-J --job-name The job name.
-t --time Define the total time required for the job
The format is days-hh:mm:ss.
--qos Declare the Quality of Service to be used.
The default is normal.
--partition Select the partition to submit the job to.
The only and default partition is htc.

The above arguments can be provided in a batch script by preceding them with #SBATCH. Note that the shebang (#!) line must be present. The shebang line can call any shell or scripting language available on the cluster. For example, #!/bin/bash, #!/bin/tcsh, #!/bin/env python or #!/bin/env perl.

srun also takes the --nodes, --tasks-per-node and --cpus-per-task arguments to allow each job step to change the utilized resources but they cannot exceed those given to sbatch.

Slurm is very explicit in how one requests cores and nodes. While extremely powerful, the three flags, --nodes, --ntasks, and --cpus-per-task can be a bit confusing at first.

--ntasks vs --cpus-per-task

The term “task” in this context can be thought of as a “process”. Therefore, a multi-process program (e.g. MPI) is comprised of multiple tasks. In Slurm, tasks are requested with the --ntasks flag. A multi-threaded program is comprised of a single task, which can in turn use multiple CPUs. CPUs, for the multithreaded programs, are requested with the --cpus-per-task flag. Individual tasks cannot be split across multiple compute nodes, so requesting a number of CPUs with --cpus-per-task flag will always result in all your CPUs allocated on the same compute node.

Example batch scripts and NGS data analysis pipelines

Scripts to perform RNASeq data analysis using HISAT2 + Stringtie are available under /ihome/sam/fangping/example/RNASeq_HISAT2_Stringtie. You can follow the readme file to go through the steps.

Examples of NGS data analysis pipelines are available at NGS Data Analysis. If you need personalized consultation for NGS data analysis workflow and selection of better pipelines, please contact Fangping Mu, PhD.

Submitting multiple Jobs to HTC cluster

Examples to submit multiple Jobs to HTC cluster. Check this link