Completed: Quarterly Cluster Maintenance Tuesday May 7
Dear CRC User Community,
We have completed our quarterly maintenance and have returned the clusters to production.
Some notable changes to the CRC clusters are the following:
Banking System Updates
We introduced significant improvements to our banking system back-end to provide a more consistent experience when viewing the summary status of your research group’s current usage and Resource Allocation status, and to pave the way for more transparency and self-service in the management of your allocation.
Some wrapper scripts have been adjusted to utilize this updated accounting system:
crc-sus
crc-usage
crc-proposal-end
You will notice these changes taking effect in the coming days.
Each of these wrappers will require you to authenticate with the password you use to log in to the cluster with.
crc-usage now shows a summary of any active resource allocations request(s), as well as per-user usage contributing towards the limits:
[nlc60@moss wrappers]: crc-usage Please enter your CRC login password: +----------------------------------------------------------------------------------+ | Resource Allocation Request Information for test | +-------------+--------------------------------------------+-----------------------+ | ID | TITLE | EXPIRATION DATE | +-------------+--------------------------------------------+-----------------------+ | 35556 | Resource Allocation Request for Test | 2024-05-31 | +-------------+--------------------------------------------+-----------------------+ | | CLUSTER | SERVICE UNITS | | | ---- | ---- | | | SMP | 600 | | | MPI | 100000 | | | GPU | 10000 | | | HTC | 25000 | | | | | +-------------+--------------------------------------------+-----------------------+ +-------------------------------------------------------------------------------------+ | Summary of Usage across all Clusters | +-------------+------------------------+-------------------------+--------------------+ | SMP | TOTAL USED: 96 | AWARDED: 600 | % USED: 16 | +-------------+------------------------+-------------------------+--------------------+ | | USER | USED | % USED | | | ---- | ---- | ---- | | | nlc60 | 96 | 16 | | | yak73 | 0 | < 1% | | | | | | +-------------+------------------------+-------------------------+--------------------+ | MPI | TOTAL USED: 0 | AWARDED: 100000 | % USED: 0 | +-------------+------------------------+-------------------------+--------------------+ | | | | | +-------------+------------------------+-------------------------+--------------------+ | GPU | TOTAL USED: 0 | AWARDED: 10000 | % USED: 0 | +-------------+------------------------+-------------------------+--------------------+ | | | | | +-------------+------------------------+-------------------------+--------------------+ | HTC | TOTAL USED: 0 | AWARDED: 25000 | % USED: 0 | +-------------+------------------------+-------------------------+--------------------+ | | | | | +-------------+------------------------+-------------------------+--------------------+
crc-proposal-end and crc-sus show their usual streamlined output of the end date and the SUs remaining in your allocations.
Reaching the usage limit will yield the usual “AssocGrpBillingMinutes” Reason code in the squeue output.
Limits on Job Array Sizes
To maintain a high quality of service and equitable access to the cluster for all kinds of workflows, we have implemented limits on Job Array sizes (visible from squeue output).
Updated GPU drivers
All GPU enabled hardware are up to date with the version 550.45 driver. Newer GPUs (A100s, L40s) have been updated to the open branch of this update.
This should correspondingly resolve any remaining problems with newer versions of pytorch.
Thank you for your patience during this downtime and as always, please submit a help ticket if you encounter any post-maintenance problems.
Sincerely,
The CRC Team