What's Down the Road for CRC?

With the recent addition of the AMD EPYC Genoa processors to the SMP cluster, right on the heel of upgrades to the GPU, HTC and MPI clusters and storage capacity in 2022, you can say that we have performed one full cycle of upgrades to every segment of CRC’s cyberinfrastructure. As you can see in the chart at right, our hardware landscape has tipped more towards the New than the Old.

So what’s next? Is there more to come?  Yes. More GPUs!

A combination of direct user feedback and the usage metrics on our dashboard pointed to the need for more GPUs. We will be finalizing our RFQ (request for quotes) bidding process for the acquisition of nodes comprised of Nvidia L40S GPUs based on the latest Ada Lovelace architecture. Benchmarks have shown the L40 (the L40S is the upgraded higher-frequency version) to provide performance similar to A100 for popular HPC applications while additionally providing up to 70% increased performance for AI training. These new nodes are targeted to be deployed before the winter break.

While our storage capacity has increased, the underlying performance has not improved commensurate with the growth of data science. We have received reports from our user base about the slowness of the filesystem when resources are under heavy usage. In coordination with Pitt IT, we have initiated a project to acquire the next generation storage filesystem that is architected specifically to address high bandwidth and high IOPs requirements. The project is expected to conclude by the end of the Spring 2024 semester with deployment during Fall.

Lastly, to complement our open science resources, CRC is deploying a HIPAA-compliant environment suitable for analysis of data that may contain any of the 18 HIPAA identifiers. The environment is similar to the open science Slurm cluster, with additional safeguards to maintain data security and privacy. The initial compute/data environment will be comprised of 600TB of storage based on self-encrypting drives, five CPU nodes (dual-socket AMD EPYC 9374F processors with 768GB of RAM), and two nodes with Nvidia L40S GPUs. The CPU system will go live in a few weeks and the GPUs will be added once we take delivery of the big purchase before the winter break.

Wednesday, November 1, 2023