Spring 2024 Workshops

CRC's first Spring 2024 Workshop will be the twice-yearly Cluster Ecosystem On-ramp on Jan 25. In addition to workshops on introductory Linux, the Spring sessions will include workshops on transitioning from a laptop to CRC, data manipulation and visualization, and fundamentals of accelerated data science.

Please note: workshops will be presented in a hybrid format with up to 20 in-person participants. If you register for an in-person session and the in-person session is full, you will be alerted and receive a video conferencing link via the e-mail address you provide. CRC will continue to follow University guidelines regarding COVID-19 precautions, so other changes could be possible depending on those guidelines.

Please note: the date for the "Introduction to Data Manipulation and Visualization" workshop has been changed to March 28 (the date first announced was during Spring Break). 

CRC Ecosystem On-ramp
Thursday Jan. 25, 1-4 pm EST
This on-ramp session will introduce new and returning users to the CRC’s specialized compute resources and data storage systems. Topics covered include cluster access modalities, loading software tools with the LMOD module system, scheduling jobs with SLURM, and resource usage strategies. 
Prerequisite: CRC user account
Instructors: Nick Comeau, Research Computing Specialist; Kim Wong, Research Associate Professor

Hands-on Introduction to Linux
Thursday Feb. 15, 1-4 pm EST
Competency with Linux is crucial to most effectively use the CRC’s advanced cyberinfrastructure. While the CRC provides GUI portals to help ease the transition from a laptop to our ecosystem, being able to work at the command line interface (CLI) will create the greatest efficiencies. In this hands-on workshop, we will demonstrate using the CLI in various topics including navigating the filesystem hierarchy; file creation, deletion, and renaming; file permissions; using the pipe command to string together basic commands into one more powerful command; aliases; text editing; and Bash shell scripting examples. Attendees will gain an overview of common CLI operations and external links to further reading.
Prerequisite: CRC user account
Instructors: Kim Wong, Research Associate Professor


Transitioning From Your Laptop to CRC
Thursday March 7, 1-4pm EST
We will provide guidelines to help transition your computational workflows to the CRC ecosystem. We will give examples of how to run, benchmark, and optimize your calculations on the CRC clusters. Topics covered will include: selecting the correct cluster for your needs, obtaining and monitoring resources on the CRC systems, understanding how to exploit the capabilities of the Slurm scheduler to maximize productivity, processing the results of your calculations, and moving data to and from the clusters. Examples from domain specific workflows will be provided. 
Prerequisites: a CRC account and familiarity with basic Linux commands and text editors (e.g., nano, vim, etc.). Having attended the CRC Workshops CRC Ecosystem On-ramp (Jan. 25) and Introductory Linux (Feb 15) will be an advantage. 
Instructor: Leonardo Bernasconi, Research Assistant Professor

Introduction to Data Manipulation and Visualization
Thursday March 28, 1-4pm EST
An important step towards developing ML applications in Python is data handling and manipulation. This workshop will cover different aspects of data manipulation such as basic table operations and missing data handling using pandas. The workshop will also cover exploratory data analyses and visualization using scikit-learn, matplotlib, and seaborn. 
Prerequisite: Basic Python programming knowledge
Instructor: Yassin Khalifa, Data Scientist

Fundamentals of Accelerated Data Science
Thursday April 4, 1-5pm EST
RAPIDS is a collection of data science libraries that allows end-to-end GPU acceleration for data science workflows. In this training, you'll: 

  • Use cuDF and Dask to ingest and manipulate massive datasets directly on the GPU 
  • Apply a wide variety of GPU-accelerated machine learning algorithms, including XGBoost, cuGRAPH, and cuML, to perform data analysis at massive scale 
  • Perform multiple analysis tasks on massive datasets in an effort to stave off a simulated epidemic outbreak affecting the U.K. 

Upon completion, you'll be able to load, manipulate, and analyze data orders of magnitude faster than before, enabling more iteration cycles and drastically improving productivity. 
Prerequisites: Knowledge of data manipulation and visualization in Python 
Instructor: Yassin Khalifa, Data Scientist 

