CRC Workshops

Spring 2024 Workshops

CRC's first Spring 2024 Workshop will be the twice-yearly Cluster Ecosystem On-ramp on Jan 25. In addition to workshops on introductory Linux, the Spring sessions will include workshops on transitioning from a laptop to CRC, data manipulation and visualization, and fundamentals of accelerated data science.

Please note: workshops will be presented in a hybrid format with up to 20 in-person participants. If you register for an in-person session and the in-person session is full, you will be alerted and receive a video conferencing link via the e-mail address you provide. CRC will continue to follow University guidelines regarding COVID-19 precautions, so other changes could be possible depending on those guidelines.

Please note: the date for the "Introduction to Data Manipulation and Visualization" workshop has been changed to March 28 (the date first announced was during Spring Break). 

Register here for all workshops

 

CRC Ecosystem On-ramp
Thursday Jan. 25, 1-4 pm EST
This on-ramp session will introduce new and returning users to the CRC’s specialized compute resources and data storage systems. Topics covered include cluster access modalities, loading software tools with the LMOD module system, scheduling jobs with SLURM, and resource usage strategies. 
Prerequisite: CRC user account
Instructors: Nick Comeau, Research Computing Specialist; Kim Wong, Research Associate Professor

Tutorial Materials

Panopto Recording

 

Hands-on Introduction to Linux
Thursday Feb. 15, 1-4 pm EST
Competency with Linux is crucial to most effectively use the CRC’s advanced cyberinfrastructure. While the CRC provides GUI portals to help ease the transition from a laptop to our ecosystem, being able to work at the command line interface (CLI) will create the greatest efficiencies. In this hands-on workshop, we will demonstrate using the CLI in various topics including navigating the filesystem hierarchy; file creation, deletion, and renaming; file permissions; using the pipe command to string together basic commands into one more powerful command; aliases; text editing; and Bash shell scripting examples. Attendees will gain an overview of common CLI operations and external links to further reading.
Prerequisite: CRC user account
Instructors: Kim Wong, Research Associate Professor

Slides

Panopto Recording

Transitioning From Your Laptop to CRC
Thursday March 7, 1-4pm EST
We will provide guidelines to help transition your computational workflows to the CRC ecosystem. We will give examples of how to run, benchmark, and optimize your calculations on the CRC clusters. Topics covered will include: selecting the correct cluster for your needs, obtaining and monitoring resources on the CRC systems, understanding how to exploit the capabilities of the Slurm scheduler to maximize productivity, processing the results of your calculations, and moving data to and from the clusters. Examples from domain specific workflows will be provided. 
Prerequisites: a CRC account and familiarity with basic Linux commands and text editors (e.g., nano, vim, etc.). Having attended the CRC Workshops CRC Ecosystem On-ramp (Jan. 25) and Introductory Linux (Feb 15) will be an advantage. 
Instructor: Leonardo Bernasconi, Research Assistant Professor

Introduction to Data Manipulation and Visualization
Thursday March 28, 1-4pm EST
An important step towards developing ML applications in Python is data handling and manipulation. This workshop will cover different aspects of data manipulation such as basic table operations and missing data handling using pandas. The workshop will also cover exploratory data analyses and visualization using scikit-learn, matplotlib, and seaborn. 
Prerequisite: Basic Python programming knowledge
Instructor: Yassin Khalifa, Data Scientist

Fundamentals of Accelerated Data Science
Thursday April 4, 1-5pm EST
RAPIDS is a collection of data science libraries that allows end-to-end GPU acceleration for data science workflows. In this training, you'll: 

  • Use cuDF and Dask to ingest and manipulate massive datasets directly on the GPU 
  • Apply a wide variety of GPU-accelerated machine learning algorithms, including XGBoost, cuGRAPH, and cuML, to perform data analysis at massive scale 
  • Perform multiple analysis tasks on massive datasets in an effort to stave off a simulated epidemic outbreak affecting the U.K. 

Upon completion, you'll be able to load, manipulate, and analyze data orders of magnitude faster than before, enabling more iteration cycles and drastically improving productivity. 
Prerequisites: Knowledge of data manipulation and visualization in Python 
Instructor: Yassin Khalifa, Data Scientist 


Past Workshops 2018-2023

Workshop Semester Links
CRC Cluster Basics Workshop Fall 2023 Panopto Video Recording
Hands-On Tutorial Content
Version control with Git/GitHub Fall 2023 Panopto Video Recording
Density Functional Theory Calculations Fall 2023 Panopto Video Recording
Introduction to Intermediate Level Python Fall 2023  
Introduction to Programming on GPUs with CUDA and Python Fall 2023  
Getting started with Software Testing Fall 2023  
Hands-on AI/ML Workshops:

Fundamentals of Accelerated Data Science

Summer 2023 Panopto Video Recording: Part 1, Part 2
Slides and Jupyter notebooks: Part1, Part2
Hands-on AI/ML Workshops:

Intermediate-level data science: Grasping Deep Learning: From Fundamentals to Applications

Summer 2023 Panopto Video Recording
Slides: Lec1, Lec2, and Lec3
Jupyter Notebooks
Hands-on AI/ML Workshops:

Data Parallelism: How to Train Deep Learning Models on Multiple GPUs

Summer 2023 Panopto Video Recording: Part1, Part2
Slides and Jupyter notebooks
CRC Ecosystem On-Ramp Spring 2023 Panopto Video Recording
Tutorial codelab content
Meeting Chat
Foundational Python Track Parts 1-3

Part 1: Introduction to Beginner-Level Python

Spring 2023 Panopto Video Recording
Slides
Meeting chat

Part 2: Introduction to Intermediate-Level Python

Spring 2023 Panopto Video Recording
Slides

Part 3: Introduction to Data Manipulation and Visualization

Spring 2023 Panopto Video Recording
Slides
Jupyter Notebook
How to Access and Use the CRC Ecosystem Fall 2022 Online Material
Panopto Recording
Version Control with Git and Data Management Perspectives Fall 2022 Panopto Recording
Introduction to Scientific Programming on GPUs Fall 2022 Panopto Recording
PDF Version of Slides
Introduction to Machine Learning with Python Fall 2022 Panopto Recording
Slides Website
Introduction to Using the Cluster Spring 2022 Panopto Recording: Session 1
Panopto Recording: Session 2
Presentation
Practical Everyday Linux Spring 2022 Panopto Recording
Presentation
Introduction to Python Spring 2022 Panopto Recording
Best Practices for Writing Intermediate Level Python Spring 2022 Panopto Recording
Overview of R for Data Work and Presentation Spring 2022 Panopto Recording
Class Notebook
Cluster training Fall 2021 Panopto recording
Panopto recording: Intro to HPC World
Panopto recording: Submit a Job Array
Advanced R Workshop Spring 2021 Workshop materials
Basic R Spring 2021 Workshop materials
Panopto recording
Intel Developer Tools Workshop Fall 2020 Intel Cluster Tools 1
Intel Cluster Tools 2
Intel Compilers Overview
Introduction to Linux Fall 2020 NIH HPC quiz
Software Carpentry's 'helping lab people compute smartly' Unix Intro
An Introduction to Linux workshop
Cluster Training Fall 2018 Panopto recording
R Programming Spring 2018 Panopto recording
Introduction to C programming Spring 2018 Panopto recording
Hybrid OpenMP/MPI programming Spring 2018 Panopto recording: Session 1
Panopto recording: Session 3
CUDA Spring 2018 Panopto recording