Fall 2022 Next Generation Sequencing Workshops

These workshops were supported in part by the University of Pittsburgh seed project titled "University of Pittsburgh Computational Genomics Training Program".

High throughput sequencing has brought abundant sequence data along with a wealth of new “-omics” protocols, and this explosion of data can be as bewildering as it is exciting. Our multi-day hands-on workshops give researchers the research, open-sourced tools to plan and execute successful bioinformatics and genomics experiments. These workshops, taught by experienced Bioinformatics core faculty, cover both the theoretical and practical aspects of a wide range of NGS data, using the HTC cluster.

These workshop have hands-on components that require the following requirements be set up before a workshop begins.

  1. Participants should have an account on the HTC cluster, which is the cluster we will use for demonstration purposes. (page 1 of this documentation)
  2. This workshop also requires that participants either be on a Pitt network (hard-line) or behind a VPN. (page 2 of this documentation)
  3. You can submit jobs, i.e., your group’s account has not expired, and your group’s service units (CPU-hours) have not been exhausted entirely (page 4 of this documentation)

As a general rule, we offer no troubleshooting for technical setup issues at the workshops themselves! Therefore, be aware that if you do not set up the workshop's technical prerequisites well in advance, you may not be able to participate fully in its hands-on activities.

You can find the titles of past NGS workshops and links to their recordings. For each workshop, there may also be slides used in the workshop and other additional relevant resources.

A familiarity with Linux and the Bash Shell is vital for these workshops. Submitting, monitoring, and managing jobs on the HTC cluster largely involves command-line operations. We do not routinely teach beginning Linux classes. If you are new to Linux environments, we highly recommend that you work your way through one of the past recordings (Introduction to Linux for NGS, Spring or fall 2021 workshop).

Register for all workshops: Fall 2022 NGS workshops

Next Generation Sequencing Techniques

Tuesday, Sep. 13
1:00 pm - 3:00pm

This workshop will cover the basis of Next-Gen Sequencing Library Preparation for Illumina Sequencers. Different Library Preparation Techniques (DNA-seq, ChIP-seq, RNA-seq, Methyl-seq and 10X Visium Spatial Transcriptomics) are explained. Quality Control steps of the starting input material and final libraries are also explained. This workshop will also discuss considerations for experimental design and the end goals of analysis prior to sequencing. Basics of sequencing and cost estimates will be discussed in the experimental design process. Presented by Amanda Poholek

Overview of NGS data analysis using Pitt ondemand

Tuesday, Sep. 20
1:00 pm - 4:00pm

This workshop will go over the procedures for accessing the advanced computing and data systems on the HTC cluster through Open Ondemand. Using NGS data analysis examples, I will introduce the HTC software environment, the SLURM scheduling system, and various resource usage strategies, such as multiple core jobs, job array, etc. I will review various modalities to analyze NGS data using the HTC cluster, including nextflow pipelines, Jupyter notebook, RStudio server and Shiny apps. Presented by Fangping Mu.

Introduction to epigenomics and ChIP-seq/ATAC-seq data analysis

Tuesday, Sep. 27
1:00pm - 4:00pm

This workshop will provide both theoretical and practical introduction to ChIP-seq and ATAC-Seq data analysis. In the first section, we will present the principle of ChIP and ATAC sequencing, bioinformatics pipeline of peak calling, data visualization, method of motif discovery, and a brief introduction of CUT&RUN sequencing. In the second half of the workshop, we will hand on a real ChIP-seq dataset to practice the pipeline using the HTC cluster. Presented by Silvia Liu.

RNASeq data analysis

Tuesday, Oct. 4
1:00pm -4:00pm

The focus of the workshop will be on running RNA seq pipelines, from raw fast files, to fastqc, alignment to reference genome and generating gene expression counts. To facilitate learning, the workshop will be centered on hands-on tutorial that will guide students in processing the data from raw reads through read counts using a real case study based approach appropriate for Illumina read data. Presented by Uma Chandran.

Introduction to NGS data analysis and WES/WGS variant calling 

Tuesday, Oct. 18
1:00pm - 4:00pm

High-throughput sequencing technology involves a number of concepts and techniques that shape a project before application-specific processes are utilized. This workshop covers common file formats for sequence data and limitations of sequencing technologies. This workshop introduces the more “universal” aspects of high-throughput sequence analysis. We will explore a hands-on exercise focusing on WES/WGS data processing for variant calling. Presented by Riyue Bao.

Differential Expression and Functional Analysis

Friday, Oct 28 (rescheduled)
1:00pm - 4:00pm

This hands-on workshop will introduce participants to statistical methods and considerations used to perform differential gene expression analysis on bulk RNA-seq data using DESEQ2. The workshop will also provide an overview of tools for functional analysis of DE genes to make biological inferences from large gene lists. We will also briefly offer an introduction of DiffBind, which provides functions for processing DNA data enriched for genomic loci, such as ChipSeq/ATACSeq. Presented by Dhivyaa Rajasundaram.

Spatial transcriptomics data analysis

Tuesday, Nov. 1
1:00pm - 4:00pm

This interactive course will introduce participants to a new generation of a spatially resolved transcriptomics assay provided by 10x Genomics. During this course, we will cover the analysis of spatial transcriptomics data in R and will introduce the participants to both theoretical concepts as well as hands-on tutorials for the analysis of spatial transcriptomics data. Publicly available data will be available for this purpose, and cover pre-processing, dimensionality reduction and clustering, integration with single cell RNA-seq data, deconvolution, cell-cell interactions, and co-expression analyses. Presented by Dhivyaa Rajasundaram.

nf-core RNASeq analysis pipeline

Tuesday, Nov. 8
1:00pm - 4:00pm

Nextflow is a workflow manager. It has been developed specifically to ease the creation and execution of bioinformatics pipelines. nf-core is a community effort to collect a curated set of analysis pipelines built using nextflow. nf-core pipelines are compatible with the HTC cluster computational infrastructures, such as the slurm job scheduler, and container/singularity for integrated software dependency management. We will introduce how to set up nf-core pipelines on the HTC cluster. We will explore a hands-on exercise focusing on nf-core/rnaseq (https://nf-co.re/rnaseq), wnich is a bioinformatics pipeline that can be used to analyse RNA sequencing data obtained from organisms with a reference genome and annotation.  Presented by Uma Chandran.

Long read Sequencing data analysis

Tuesday, Nov. 15
1:00pm - 2:00pm

This seminar will review long-read sequencing approaches in NGS technology, and bioinformatic challenges, caused by coverage biases, high error rates in base allocation, scalability and limited availability of appropriate pipelines. Presented by Silvia Liu.

Deep learning for genomics

Tuesday, Dec. 6
1:00pm - 4:00pm

This workshop will give a hands-on introduction to running deep learning jobs on the GPU cluster using genomics examples. Using a single cell RNASeq example, I will introduce how to generate a conda environment, install the deep learning tools, and run the deep learning tool through Open Ondemand Jupyter notebook. I will then introduce how to submit deep learning jobs through singularity container. Presented by Fangping Mu.

T cell receptor (TCR) data analysis

Thursday, Dec. 15 (rescheduled)
1:00pm - 4:00pm

In this workshop, we will focus on characterizing tumor-infiltrating T cell receptor (TCR) repertoire from bulk RNA-sequencing data. The workflow will cover the implementation of computational algorithms to extract TCR hypervariable complementarity determining region 3 (CDR3) followed by descriptive statistics for TCR repertoires, shared clonotype analysis and repertoire comparison, repertoire diversity and gene usage analysis, and visualization. Presented by Dhivyaa Rajasundaram.