Summer 2023: Hands-on AI/ML Workshops | Center for Research Computing and Data

CRC is offering a series of hands-on AI/ML workshops from May through July that incorporate materials from NVIDIA workshops.

Please note: workshops will be presented in a hybrid format with up to 20 in-person participants. If you register for an in-person session and the in-person session is full, you will be alerted and receive a video conferencing link via the e-mail address you provided. CRC will continue to follow University guidelines regarding COVID-19 precautions, so other changes could be possible depending on those guidelines.

~~Please use this link to register for the workshops.~~ Due to demand, we needed to cap the number of users allowed for the hands-on experience and registration now is only for attending the workshops (virtually or in-person).

Fundamentals of Accelerated Data Science

Part1: Thursday, May 4, 1-4pm EST
Part2: Thursday, May 11, 1-4pm EST
RAPIDS is a collection of data science libraries that allows end-to-end GPU acceleration for data science workflows. In this 2-part training, you'll:

Use cuDF and Dask to ingest and manipulate massive datasets directly on the GPU
Apply a wide variety of GPU-accelerated machine learning algorithms, including XGBoost, cuGRAPH, and cuML, to perform data analysis at massive scale
Perform multiple analysis tasks on massive datasets in an effort to stave off a simulated epidemic outbreak affecting the U.K.

Upon completion, you'll be able to load, manipulate, and analyze data orders of magnitude faster than before, enabling more iteration cycles and drastically improving productivity.
Instructor: Yassin Khalifa, Data Scientist and CRC Team member.
Panopto Recording: Part 1, Part 2 Slides and Jupyter notebooks: Part1, and Part2

Intermediate-level data science: Grasping Deep Learning: From Fundamentals to Applications

Thursday, June 15, 1-5pm EST
This four-hour workshop aims to provide participants with a solid foundation in deep learning. The lecture will cover essential topics, including model building, training, and evaluation, as well as popular deep-learning models. Participants will also be able to work on hands-on examples related to the fields of Computer Vision and Natural Language Processing. The objective is to equip participants with the necessary knowledge to independently learn more advanced deep learning concepts and models and apply their skills to practical problems.
To attend this lecture, participants should have a basic understanding of machine learning, including supervised and unsupervised learning, as well as experience with machine learning model training and evaluation. Prior experience with Python programming is also required.
Contents:

Deep learning fundamentals: motivation, evolution, general architecture, model training, and performance evaluations.
Convolutional neural networks (CNN): convolution operation, parameter sharing, applications, architectures, advantages, and disadvantages.
Natural language processing (NLP): generating vocabulary, a bag of words, recurrent networks, attention mechanism, transformers, advantages, and disadvantages.
Implementation and fine-tuning of ResNet CNN architecture for image classification and transformer-based language model for text classification.

Instructor: Yufei Huang, Leader, AI Research, UPMC Hillman Cancer Center, and Arun Das, Hillman Fellow for Innovative Cancer Research, UPMC Hillman Cancer Center.
Panopto Recording
Slides: Lec1, Lec2, and Lec3
Jupyter Notebooks

Data Parallelism: How to Train Deep Learning Models on Multiple GPUs

Part1: Thursday, July 6, 1-4pm EST
Part2: Thursday, July 13, 1-4pm EST
Modern deep learning challenges leverage increasingly larger datasets and more complex models. As a result, significant computational power is required to train models effectively and efficiently. Learning to distribute data across multiple GPUs during deep learning model training makes possible an incredible wealth of new applications utilizing deep learning.
Additionally, the effective use of systems with multiple GPUs reduces training time, allowing for faster application development and much faster iteration cycles. Teams who are able to perform training using multiple GPUs will have an edge, building models trained on more data in shorter periods of time and with greater engineer productivity.
This 2-part workshop teaches you techniques for data-parallel deep learning training on multiple GPUs to shorten the training time required for data-intensive applications. Working with deep learning tools, frameworks, and workflows to perform neural network training, you’ll learn how to decrease model training time by distributing data to multiple GPUs, while retaining the accuracy of training on a single GPU.
By participating in this workshop, you'll learn to:

Understand how data parallel deep learning training is performed using multiple GPUs
Achieve maximum throughput when training, for the best use of multiple GPUs
Distribute training to multiple GPUs using PyTorch Distributed Data Parallel
Understand and utilize algorithmic considerations specific to multi-GPU training performance and accuracy

Instructor: Yassin Khalifa, Data Scientist and CRC Team member.
Panopto recordings: Part1, Part2
Slides and Jupyter notebooks