Completed: Quarterly Cluster Maintenance Wednesday January 3

Dear CRC User Community,

We have completed our quarterly maintenance and have returned the clusters to production.

Some notable changes to the CRC clusters are the following:

  • Enabled cgroup-based resource management for SLURM jobs on all clusters
  • SLURM updates to 22.05.11
  • Upgrades on ix and ix1
  • GPU cluster a100_multi partition usage policy implementation: 
  1. Jobs must request at least 2 nodes and no more than 8 nodes 
  2. Jobs submitted to this partition can no longer undersubscribe the nodes they request. Attempting this will yield the following message on submission:
ERROR: Your job is not requesting the full number of GPUs on the a100_multi partition node

 

Thank you for your patience during this downtime and as always, please log in and submit a help ticket if you encounter any post-maintenance problems.

Sincerely, 

The CRC Team