Lessons from SC22 in Dallas

CRC's consultant team attended the Supercomputing 2022 conference in Dallas, Texas in November, with the theme "HPC (high performance computing) accelerates." We are sharing some of what the team learned with our user community.


Kim Wong, Research Associate Professor
and CRC Co-director

Kim Wong: advanced computing healthy, diverse, ripe for innovations

"HPC Accelerates" was the tagline for SC22, and equally appropriate would be the tagline "The Community Celebrates!" After experimenting, with limited success, with virtual and hybrid events in 2020 and 2021, SC22 welcomed almost 12,000 attendees to the conference in Dallas, with the majority participating in person. In-person participation was roughly 80 percent of what it was before the pandemic.

I am happy to report that the global advanced computing and data ecosystem is healthy, diverse, and ripe for innovations. On the horizon are next-generation GPUs and CPUs from Nvidia, AMD, and Intel. There will be new low latency networking technologies entering the market, new storage platforms, new data management tools, new programming paradigms and tools, more cloud service offerings, new cooling technologies, and fresh ideas on how best to integrate these cutting-edge components to support computational- and data-enabled science and engineering. We will have many more choices on the horizon.

Choice is good. Competition is good. These complementary market forces germinate new thinking and innovative solutions that continually push the boundaries of progress. New ideas present challenges and opportunities – how best to train students and researchers on the latest methods to seize new opportunities to become leaders in an increasingly data-driven economy.

We have entered the age of Exascale Computing. With ORNL's Frontier system taking the top spot on the TOP500 list at 1.1 Exaflops, we have demonstrated three orders of magnitude increase in computing power since 2008, when Roadrunner launched us into the Petascale Computing era. Argonne's Aurora system will follow shortly in 2023, clocking at an anticipated peak performance of > 2 Exaflops! This is indeed an exciting time for our field of super duper computing and data-driven science. Next year, SC23 will return to Denver, CO, the site of the last fully in-person conference, which set a record of 13,950 attendees in 2019, prior to the COVID-19 pandemic blunting the upward trajectory of participation. Below, my colleagues at CRC will share some of their SC22 highlights.

 


 Jeffrey A. Raymond, HPC Systems Engineer
Pitt Information Technology

Jeff Raymond: new support for DEI, student education and cancer research 

Diversity, Equity and Inclusion were major focus topics at the SC22 conference and expo. From the Technical program through to the exhibit floor and from SCinet (the world's fastest network!!!) to the student computing cluster competition all events reflected a pervasive effort and drive to bring a heterogeneous group of human capital together to solve grand challenge problems. Research has shown that tapping into under-represented groups of people to ensure they can contribute through all phases of services and system life cycles results in better, more robust solutions.

One of the more popular and highly attended presentations was the perpetual workshop Women in High Performance Computing (WHPC), which asked the question: "Why are there not more women in HPC?" WHPC has become a year-round organization that provides mentoring, jobs, training, scholarships and other activities to help new, early career and established women in HPC.

SC22 highlighted new efforts by ACM and IEEE, two of the professional organizations sponsoring the conference to combat intellectual theft and ethics violations, SIGHPC CARES (Committee to Aid REporting on discrimination and haraSsment) and IEEE Assist.

 Both SIGHPC cares and IEEE Assist intend to provide mechanisms to report and act on accusations of harassment, discrimination, misogyny and racism and other misconduct. In one case, an author submitted work that was not selected, then found elements of their original submission being used without credit. These programs will continue to make sure the SC conferences are a place were all can contribute, benefit, thrive and feel included

SC22 has a strong technical program for students, with a focus on learning for people no matter what level of knowledge they bring to the table.  From a beginner in HPC monitoring and systems support to an expert domain scientist in single cell transcriptomics, there were many relevant sessions in a combination of formats to encompass varieties of learning styles. Sessions included hands-on tutorials and workshops, panels, “birds of a feather” meet ups, posters and papers.

One premier event at SC22 was the Student Cluster Competition (SCC), developed in 2007 to provide an immersive high performance computing experience to undergraduate and high school students. With sponsorship from hardware and software vendor partners, student teams design and build small clusters, learn scientific applications, apply optimization techniques for their chosen architectures, and compete under real-world scientific workloads in a 48-hour challenge, showing off their HPC knowledge for attendees and judges. An extension to the program enabled remote teams to participate as well.

Cancer is a grand challenge problem, in which researchers, applications developers and systems professionals in the HPC community are working to discover novel approaches to treat, cure and heal cancer patients.  On a personal note, I know well the passion to reigning in cancer, having lost several families members to cancer over the years, including my father.

One of the most thorough presentations on cancer treatment is "Computational Approaches for Cancer,” a workshop now in its eighth year at the SC conference.

Two standout presentations looked at AI and Deep Learning Language models in cancer research: "Long Document Transformers for Pathology Report Classification," presented by Mayanka Chadra Shekar at Oak Ridge National Laboratory, and "Generating Real-World Evidence in Cancer," presented by Amber Simpson of Queens University Canada. Find more about the workshop at: https://ncihub.cancer.gov/groups/cafcw/cafcw22/cafcw22_program.

 

                          


Leonardo Bernasconi, Research Assistant
Professor and CRC Consultant

Leonardo Bernasconi: Quantum computing on the rise

Whereas classical computers operate on bits, which can have a single value (either 1 or 0), quantum computers work on qubits, which are superpositions of both states. While a classical computer must follow one route at a time to reach a solution, a quantum computer can test all possible routes simultaneously. Quantum computers are therefore useful, in principle, to address problems whose complexity make them insoluble for classical computers.

Quantum computers are typically unable to solve complex problems in isolation, but they can be coupled to classical computers, acting as accelerators.

Companies in Europe are making available on-premises (rather than cloud-based) quantum computer accelerators with up to 54 qubits. Although these devices cannot yet be used for production runs, they can be useful for training scientists in the application of quantum computing in physics, chemistry and engineering.

As one of these vendors put it: “Despite the rather elusive nature of quantum physics in science and its exceptionally high barrier of entry, early adopters will nevertheless gain a significant advantage in the upcoming quantum revolution.”

 

      

 

                         


Chengnian (Cheng) Xiao,
CRC Engineering HPC Consultant

Cheng Xiao: Modern compression techniques for scientific data

In contrast to lossless compression, modern lossy compression techniques such as SZ or ZFP allow researchers to reach a compromise between accuracy and compression ratio. By exploiting local smoothness and redundancy in scientific data, compression ratios as high as 100 or above can be achieved without noticeably affecting the visualization of derived quantities such as gradients from the compressed data. These techniques have been successfully applied to data originating from astrophysics, quantum chemistry as well as atmospheric sciences.  However, they are still at an experimental stage and require input from domain specialists to detect and weed out potential bugs.

 

 

 

 

 

 

 

                               


Daniel  Perrefort, Research Assistant
Professorand CRC Consultant

Daniel Perrefort: Thermal Performance

Thermal performance was a big theme at supercomputing this year. Hardware manufacturers continue to push for improved performance, typically at the cost of significant power consumption and heat generation. Many top-of-the-line chips now require liquid cooling, and the days of air-cooled high-performance computing may be numbered. It will be interesting to see how HPC centers (Pitt included) will make the transition. Fortunately, there are still plenty of air-cooled solutions on the market to meet HPC demand for at least the next few years.

 

 

  

 

 

 

 

 

                             


Nickolas Comeau,
CRC Research Computing Specialist

Nickolas Comeau: Scaling Deep Learning

At SC22, I attended a tutorial on scaling deep learning models in PyTorch from training on a single GPU to distributing that training across many GPUs.

We used Nvidia’s Nsight Systems tool to look under the hood of our model training. Upon loading the profiler output into the Nsight Systems GUI, a wealth of information about what the CPU/GPU were spending their time doing became available, making it easier to spot bottlenecks and speed up the training.

Finding and reducing inefficiencies in the GPU usage of your SLURM jobs may help you get more out of the service units you spend on our clusters, so I made Nsight Systems available for loading on the CRC clusters via our module system (`module load nsight-systems’). The command line version is called `nsys` and the GUI (try it out on VIZ) is `nsys-ui`. Thanks to CRC team member Cheng Xiao, there is an example GPU program and its profiling output in `/ihome/crc/how_to_run/nsight-systems`.

 

 

 

 

 

        

                       


Fangping Mu, Research Assistant Professor
and CRC Consultant

Fangping Mu: New storage and cooling, cloud computing feels higher cost headwinds

I saw some innovative data storage systems presented at SC22:

  • Starfish is a unique software application for managing files and objects at any scale.
  • VAST is all-flash file and object storage.
  • weka.io is a data management provider that delivers a software-based data platform for high I/O workloads, such as deep learning.
  • Gen3 is a data platform for building data commons and data ecosystems.

As Daniel noted above, thermal performance was an important topic. Liquid cooling was described as the dominant cooling system for next generation of HPC, and we have about two years to transition to liquid cooling as the universal standard.

In contrast to some past presentations, there was a sober view of cloud computing. Cloud computing costs remain high relative to on-premise resources, but that there are value-added services that are attractive for particular use cases, meaning that cloud computing will essentially be a niche component of HPC (cloud computing currently occupies about 10 to 20 percent of the HPC market.