Since CRCD began over seven years ago, we have worked within the Red Hat Enterprise Linux (RHEL) version 7 as our base Operating System (OS) for the login and compute nodes. RHEL 7 is no longer meeting our needs. Among other issues, regular security updates are needed but they entail additional cost; and the latest software used by Pitt researchers requires updated libraries.
We will be upgrading to RHEL 9. This upgrade will be disruptive in that the software stack is being rebuilt in the new OS, and existing Slurm jobs will need to be resubmitted with adjusted submission scripts. To mitigate the impact of this, a staging environment has been created for CRCD team members and users to test job submission scripts in the new OS.
RHEL9 Staging Environment Specifications
Cluster | Partition | Nodes | Architecture | Compute/Node | Memory |
---|---|---|---|---|---|
rh9 | cpu | 1 | AMD EPYC 9575F | 128 Cores | ~1.5 T |
mpi | 1 | AMD EPYC 9575F | 128 Cores | ~1.5 T | |
gpu | 2 | L40s | 4 GPUs | ~48GB per GPU |
Access Methods
Access Method 1: Command-line ssh access
- The access point for using the rh9 cluster is the following host: login2.crc.pitt.edu
- This login node can be accessed in two steps:
1. ssh into H2P or HTC like normal: user@my-local-pc:~$ ssh user@h2p.crc.pitt.edu user@h2p.crc.pitt.edu's password: ... 2. ssh from there into login2: [user@login1 ~] : ssh login2.crc.pitt.edu ...
Access Method 2: Open OnDemand
- Portal URL: seed.crc.pitt.edu.
- As usual, you will need to be connected to PittNet via the Globalprotect VPN first to access it.
- Updating some of the functionality is ongoing, please report if anything is missing.
Checking Software Availability
Try searching for relevant modules by name from the command line first, outside the context of a job submission.
For a broad overview of available packages, use module avail
[user@login2 ~] : module avail ---------------------------------------------------- /software/rhel9/spack/modules/Core ---------------------------------------------------- amdlibflame/5.0-qnlh6c gatk/4.1.2.0-ryihs7 openslide/3.4.1-lq34bt anaconda3/2021.11-fqibig gatk/4.1.4.0-dwdsjt p7zip/17.05-nn5izq anaconda3/2022.10-x3avg2 gatk/4.1.4.0-t6lr77 packmol/20.0.0-oujfey anaconda3/2023.09-0-amgrwv (D) gatk/4.2.2.0-m57fnu paml/4.10.3-bwkrsj ants/2.4.3-bor5ky gatk/4.4.0.0-m7dcto pandoc/2.19.2-do5ryv apptainer/1.1.9-zjyglv gatk/4.5.0.0-sl3qs3 (D) pangolin/master-hivha4 aspera-cli/3.7.7-jkl6wp gblocks/0.91b-c5xybu parallel/20220522-wbuewg augustus/3.5.0-k2zdam gcc/7.5.0-xp7vam pcre/8.40-2v6t57 ...
You can find the available versions of a given package with module spider package_name
[user@login2 ~] : module spider python ---------------------------------------------------------------------------------------------------------------------------------------- python: ---------------------------------------------------------------------------------------------------------------------------------------- Versions: python/ondemand-jupyter-python3.13.5 python/pytorch_251_311_cu124 python/tensorflow_218_311 python/3.7.0-52qxuq python/3.7.17-d53vam python/3.7.17-4pgmy5 python/3.8.18-5jdolp python/3.9.18-oe3og4 python/3.10.13-riplp2 python/3.11.6-wnbaq7 python/3.11.9-k3zy4r python/3.12.0-ig3l6e python/3.12.8-ydargp ...
Attempt to load the package into your environment with module load
[user@login2 ~] : module load python/3.12.8 [user@login2 ~] : module list Currently Loaded Modules: 1) python/3.12.8-ydargp
You may see newer versions of the packages in your workflow. This may be a good time to update your workflow to use them, or at least see if things work as is with a newer version.
If a module does not have your required version, fails to load, or is missing outright:
Submit a Support Ticket with the exact module name/version you need and any error messages encountered. We will work with you to install missing modules and dependencies, prioritizing reported issues.
Updating your Job Submission Script
To adapt a job submission script to work on the rh9 cluster, you will need to adjust the following items:
- Cluster specification: #SBATCH --cluster=rh9
- Partition Specification: #SBATCH --partition=cpu (or gpu, mpi)
- Module commands: You should only attempt to load module names/versions that you have verified exist in the rh9 environment.
Frequently Asked Questions
Q1: When will the remaining hardware from the RHEL 7 environment be brought over to RHEL 9? The quarterly cluster maintenance on Tuesday, August 19th is when we intend to transition the rest of the hardware.
Q2: I compiled my own software under the RHEL 7 environment, will I need to recompile it under RHEL9? Yes, the software stack we've made available has mostly been reinstalled under RHEL9, so it is best to assume your software will need to be recompiled as well.
Q3: Do I need to move data into this cluster context for testing? No, the usual filesystems (iX, ihome, etc.) are mounted and available to the RHEL 9 login and compute nodes.
Q4: What is the string of characters in the module names after the name and version? A large amount of the packages were reinstalled using an HPC package management tool called Spack. This tool automatically generates modulefiles compatible with LMOD, and uses characters from any given installation's unique hash to prevent naming conflicts in the module files.