This cluster is designed to run high throughput computing jobs efficiently. HTC cluster is designed to support bioinformatics and health science research.
Here is the table of content:
- Access to HTC
- Off-campus access (setting up the VPN)
- On-campus access
- Node configuration
- Application environment
- Installed packages
- Slurm Workload Manager
- Slurm jobs
- Service unit
- Farishare and priority
- Example batch scripts
- PBS to Slurm commands
- CRC wrappers
CRC computational resources is housed off-campus at the University’s main datacenter at RIDC Park. CRC clusters is firewalled, so you can not directly access it when you are off-campus. You should first pass the firewall using the VPN service, and then try to connect to CRC clusters.
If you are off-campus, the cluster is accessible securely from any where in the world via Virtual Private Networking (VPN), a service of CSSD. VPN requires certain software to run on your system, and multiple alternatives are available in order to cover almost all systems and configurations.
Download/Install Pulse VPN and follow the instruction as follows:
VPNC is a commandline VPN application which may be the most convenient for some Linux users.
Most distributions provide prebuild binaries, or you can get the source and install your own:
- Ubuntu: sudo apt-get install vpnc
- SUSE: sudo zypper install vpnc
- CentOS/RedHat: sudo yum install vpnc
- Arch Linux: sudo pacman -S vpnc
- All (source): http://www.unix-ag.uni-kl.de/~massar/vpnc/
Once installed, download the configuration file here (requires login) and move the file to /etc/vpnc/pitt.conf. Then:
- Replace your Pitt username with REPLACEME
- Run sudo vpnc pitt and then enter your Pitt password. To stop, run sudo vpnc-disconnect
- Disconnect will kill most recent vpnc
- Kill all of them with sudo killall vpnc
To use CRC resources, users must first have a valid Pitt ID, and then formally request an account. Once you have valid login credentials, the clusters can be accessed via SSH. For example to connect to H2P:
$ ssh pittID@htc.crc.pitt.edu
Your username is your PittID and your password is the same as your campus-wide Pitt password.
To check whether you can use the cluster, use sacctmgr list user PittID
[fangping@login0a example]$ sacctmgr list user fangping User Def Acct Admin ---------- ---------- --------- fangping sam None
If you do not see your ID listed, you are not granted usage of this cluster. If you believe that you should grant access, please submit a ticket.
Open your favorite terminal emulator
There are 20 compute nodes in total with the following configuration:
- 16 E5-2660 v3 (Haswell) nodes
- 2.40GHz, 16 cores
- 256 GB RAM 2133 MHz
- 256 GB SSD
- 56 Gb/s FDR InfiniBand
- 4 E5-2643v4 (Broadwell) nodes
- 3.40 GHz, 16 cores
- 256 GB RAM
- 256 GB SSD
- 56 Gb/s FDR InfiniBand
- 4 Xeon Gold 6126 (Skylake) nodes
- 2.60 GHz, 24-core
- 377 GB RAM
- 256 GB SSD & 500 GB SSD
- 56 Gb/s FDR InfiniBand
There are two login nodes that can be used for compilation.
- E5-2620 v3 (Haswell)
- 2.40GHz, 12 cores (24 hyperthreads)
- 64 GB 1867 MHz
- 56 Gb/s FDR Infiniband
For performance reasons the following configuration has been chosen for compute nodes and login nodes.
- RedHat Enterprise 7.6
All nodes in the HTC mount the following file severs.
It is important to note the $HOME directories are shared with other clusters and configuration files may not be compatible. Please check through your .bashrc, .bash_profile and all other dotfiles if you encounter problems.
|BeeGFS, not backup||/bgfs|
|ZFS, not backup, 7 days snapshot||/zfs1, /zfs2|
|Scratch (compute only)||/scratch|
GNU 4.8.5 compilers are available in your path when you login. Newer GNU 8.2.0 compilers are available as module environments.
Currently, HTC cluster does not support distributed parallel MPI jobs. Only shared memory parallel jobs are supported.
GNU compilers are available in your path when you login. Newer GNU compilers are available as module environments.
|Compiler||Version||executable name||AVX2 support|
See the man pages man <executable> for more information about flags.
- GCC 8.2.0 is available through the Lmod Application Environment. See below.
- Currently, HTC cluster does not support distributed parallel MPI jobs. Only shared memory parallel jobs are supported.
The Haswell CPUs support AVX2 instructions. The GCC 8.2.0 compiler support AVX2 with the -march=core-avx2 flag. The login nodes have the same architecture as the compute nodes.
Lmod will be used by cluster administrators to provide optimized builds of commonly used software. Applications be available to users through the Lmod modular environment commands. There are no default modules loaded when you log in.
Use the command "module spider" to list all installed applications. The architecture for the HTC Cluster is called haswell, which means that codes have been compiled to utilize the AVX2 instruction set as best as possible.
We have implemented a hierarchical structure of module files. Use "module avail" to list "core" modules
[fangping@login0a ~]$ module avail -------------------------------------------- /ihome/crc/modules/Core -------------------------------------------- abaqus/2016-vandegeest matlab/R2017a (D) abaqus/2017-general matlab/R2018a abaqus/2017-vandegeest (D) maven/3.5.0 adf/2017.108 medea/2.22.3 admixmap/3.8.3103 meerkat/0.189 afni/18.0.22 meme/5.0.3 ...
To load modules compiled using GCC 8.2.0, run "module load gcc/8.2.0" then "module avail".
Module environment files have been created for each of these packages and can be easily loaded into your shell with "module load <packagename>" for "core" modules. To load modules compiled using GCC 8.2.0, for example r/3.5.1, use "module load gcc/8.2.0 r/3.5.1"
In the example below I have loaded the HISAT package into my environment. The executables, such as hisat2, hisat2-build is now in my PATH.
[fangping@login0a ~]$ module load hisat2/2.1.0 [fangping@login0a ~]$ which hisat2 /ihome/crc/install/hisat2/hisat2-2.1.0/hisat2 [fangping@login0a ~]$ which hisat2-build /ihome/crc/install/hisat2/hisat2-2.1.0/hisat2-build
You can check which modules are “loaded” in your environment by using the command module list
[fangping@login0a ~]$ module list Currently Loaded Modules: 1) hisat2/2.1.0
To unload or remove a module, just use the unload option with the module command, but you have to specify the complete name of the environment module:
[fangping@login0a ~]$ module unload hisat2/2.1.0 [fangping@login0a ~]$ module list No modules loaded
Alternatively, you can unload all loaded environment modules using module purge.
- Several reference genome data are available at /mnt/mobydisk/pan/genomics/refs
Slurm Workload Manager
The HTC cluster uses Slurm for batch job queuing. 16 compute nodes belong to the htc partition and it is the default partition. The sinfo command provides an overview of the state of the nodes within the cluster.
[fangping@login0a ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST htc* up 6-00:00:00 4 mix n[410,413,417,427] htc* up 6-00:00:00 16 alloc n[409,411-412,414-416,418-426,428]
Nodes in the alloc state mean that a job is running. The asterisk next to the htc partition means that it is the default partition for all jobs.
squeue shows the list of running and queued jobs.
The most common states for jobs in squeue are described below. See man squeue for more details.
|CA||CANCELLED||Job was explicitly cancelled by the user or system administrator. The job may or may not have been initiated.|
|CD||COMPLETED||Job has terminated all processes on all nodes.|
|CG||COMPLETING||Job is in the process of completing. Some processes on some nodes may still be active.|
|F||FAILED||Job terminated with non-zero exit code or other failure condition.|
|PD||PENDING||Job is awaiting resource allocation.|
|R||RUNNING||Job currently has an allocation.|
|TO||TIMEOUT||Job terminated upon reaching its time limit.|
See man squeue for a complete description the possible REASONS for pending jobs.
To see when all jobs are expected to start run squeue --start.
The scontrol output shows detailed job output.
scontrol show job <jobid>
- Note: not all jobs have a definite start time.
The three most important commands in Slurm are sbatch, srun and scancel. sbatch is used to submit a job script to the queue like the one below, called example.sbatch srun is used to run parallel jobs on compute nodes. Jobs can be canceled with scancel.
#!/bin/bash # #SBATCH -N 1 # Ensure that all cores are on one machine #SBATCH -t 0-01:00 # Runtime in D-HH:MM #SBATCH --cpus-per-task=4 # Request that ncpus be allocated per process. #SBATCH --mem=10g # Memory pool for all cores (see also --mem-per-cpu) # This job requires 4 CPUs (4 CPUs per task). Allocate 4 CPUs from 1 node in the default partition. # Change to the directory that the script was launched from. This is the default for SLURM. module load hisat2/2.1.0 hisat2-build ./reference/22_20-21M.fa 22_20-21M_hisat hisat2 -p $SLURM_CPUS_PER_TASK -x 22_20-21M_hisat -U ./reads/reads_1.fq -S eg1.sam hisat2 -p $SLURM_CPUS_PER_TASK -x 22_20-21M_hisat -1 ./reads/reads_1.fq -2 ./reads/reads_2.fq -S eg2.sam
- NOTE: requests for walltime extensions will not be granted
This is an example job script to run hisat examples. To run this script, copy the hisat example folder as cp -r /ihome/sam/apps/HISAT/hisat-0.1.6-beta/example .; cd example, and generate text file named example.sbatch with the contents like the one above. This job is submitted with the command sbatch example.sbatch By default the standard out is redirected to slurm-<jobid>.out.
[fangping@login0a example]$ sbatch example.sbatch Submitted batch job 389675 [fangping@login0a example]$ head slurm-389675.out Settings: Output files: "22_20-21M_hisat.*.ht2" Line rate: 6 (line is 64 bytes) Lines per side: 1 (side is 64 bytes) Offset rate: 4 (one in 16) FTable chars: 10 Strings: unpacked Local offset rate: 3 (one in 8) Local fTable chars: 6 Local sequence length: 57344
- Note: By default the working directory of your job is the directory from which the batch script was submitted. See below for more information about job environments.
The sbatch arguments here are the minimal subset required to accurately specify a job on the htc cluster. Please refer to man sbatch for more options.
|-N --nodes||Maximum number of nodes to be used by each Job Step.|
|--tasks-per-node||Specify the number of tasks to be launched per node..|
|--cpus-per-task||Advise the SLURM controller that ensuing job steps will require ncpus number of processors per task.|
|-e --error||File to redirect standard error.|
|-J --job-name||The job name.|
|-t --time||Define the total time required for the job
The format is days-hh:mm:ss.
|--qos||Declare the Quality of Service to be used.
The default is normal.
|--partition||Select the partition to submit the job to.
The only and default partition is htc.
The above arguments can be provided in a batch script by preceding them with #SBATCH. Note that the shebang (#!) line must be present. The shebang line can call any shell or scripting language available on the cluster. For example, #!/bin/bash, #!/bin/tcsh, #!/bin/env python or #!/bin/env perl.
srun also takes the --nodes, --tasks-per-node and --cpus-per-task arguments to allow each job step to change the utilized resources but they cannot exceed those given to sbatch.
Slurm is very explicit in how one requests cores and nodes. While extremely powerful, the three flags, --nodes, --ntasks, and --cpus-per-task can be a bit confusing at first.
--ntasks vs --cpus-per-task
The term “task” in this context can be thought of as a “process”. Therefore, a multi-process program (e.g. MPI) is comprised of multiple tasks. In Slurm, tasks are requested with the --ntasks flag. A multi-threaded program is comprised of a single task, which can in turn use multiple CPUs. CPUs, for the multithreaded programs, are requested with the --cpus-per-task flag. Individual tasks cannot be split across multiple compute nodes, so requesting a number of CPUs with --cpus-per-task flag will always result in all your CPUs allocated on the same compute node.
Example batch scripts and NGS data analysis pipelines
Scripts to perform RNASeq data analysis using HISAT2 + Stringtie are available under /ihome/sam/fangping/example/RNASeq_HISAT2_Stringtie. You can follow the readme file to go through the steps.
Examples of NGS data analysis pipelines are available at NGS Data Analysis. If you need personalized consultation for NGS data analysis workflow and selection of better pipelines, please contact me (email@example.com).
Submitting multiple Jobs to HTC cluster
Examples to submit multiple Jobs to HTC cluster
PBS to Slurm commands
PBS Torque and SLURM scripts are two frameworks for specifying the resource requirements and settings for the job you want to run. Frank used PBS Torque for specifying the resource requirement. For the most part, there are equivalent settings in each script. The following table lists examples of equivalent options for PBS and SLURM job scripts.
|Job submission||qsub -q <queue> -l nodes=1:ppn=16 -l mem=64g <job script>||sbatch -p <queue> -N 1 -c 16 --mem=64g <job script>|
|Job submission||qsub <job script>||sbatch <job script>|
|Node count||-l nodes=<count>||-N <min[-max]>|
|Cores per node||-l ppn=<count>||-c <count>|
|Memory size||-l mem=16384||--mem=16g|
|Wall clock limit||-l walltime=<hh:mm:ss>||-t <days-hh:mm:ss>|
|Job name||-N <name>||--job-name=<name>|
A complete comparison of PBS Torque and SLURM script commands is available here.
To submit an interactive job, you should initiate with the srun command instead of sbatch. This command:
srun -n1 -t02:00:00 --pty bash
will start an interactive job. When the interactive job starts, you will notice that you are no longer on a login node, but rather one of the compute nodes.
[fangping@login0a ~]$ srun -n1 -t02:00:00 --pty bash [fangping@n409 ~]$
This will give you 1 core for 2 hours.
Interactive jobs with x11 forwarding
If you would like to run application that have a GUI interface and for those cases X11 is required, you must pass an authenticated X11 session for the login node to your interactive session on a compute node. You can follow the following steps:
Login from Linux or a Mac terminal:
ssh -X htc.crc.pitt.edu
Then initiate an interactive session with --x11 options.
srun -n1 -t02:00:00 --x11=first --pty bash
This will initiate an X11 tunnel to the first node on your list. –-x11 has additional options of batch, first, last, and all.
Once in your interactive session you can launch software that has a GUI from the command line.
We have implemented Open Ondemand to run common GUI tools, such as RStudio, Jupyter Notebook, Jupyter Lab and Matlab.
Quality of Service
All jobs submitted to Slurm must be assigned a Quality of Service (QoS). QoS levels define resource limitations. The default QoS is normal.
|Quality of Service||Max Walltime||Priority factor|
- Walltime is specified in days-hh:mm:ss
If your job does not meet these requirements it will be not be accepted.
Jobs on the htc cluster are executed in order of priority. The priority function has four components Age, FairShare, QoS and JobSize. Each component has a value between 0 and 1 and each are weighted separately in the total job priority. Only the Age factor increases as the job waits.
- NOTE: The priority weights are intended to favor jobs that use more nodes for shorter wall times.
|Age||Total time queued.
Factor reaches 1 at 14 days.
|QoS||Priority factor from QoS levels above.||2000|
|JobSize||Factor approaches 1 as more nodes are requested||4000|
|FairShare||FairShare factor described below||2000|
- The maximum priority value is 10000 for any job.
Even though jobs are expected to run in order of decreasing priority, backfill allows jobs with lower priority to fit in the gaps. A job will be allowed to run through backfill if it's execution does not delay the start of higher priority jobs. To use backfill effectively users are encouraged to submit jobs with as short a walltime as possible.
FairShare has been enabled which adjusts priorities for jobs based on historical usage of the cluster. The FairShare priority factor is explained on the Slurm website.
To see the current FairShare priority factor run sshare. Several options are available, please refer to man sshare for more details.
The FairShare factors for all users is listed with sshare -a.
On the HTC cluster all users are given equal shares. We may change this policy based on the usage of HTC clusters.
Local Scratch directory
Each node in the HTC Cluster has a single scratch disk for temporary data generated by the job. Local scratch directories are created on each node in the following location at the start of job or allocation.
The $SLURM_SCRATCH environment variable is then set in the job's environment to the above scratch directory.
- The $SLURM_SCRATCH directories are removed from each node at the completion of the job
To copy files to the $SLURM_SCRATCH scratch disk on the master compute node just use cp or rsync. Remember, the initial working directory for the job is the directory from which the job was submitted. To allow srun to run the job from the $SLURM_SCRATCH scratch directory add --chdir.