Clues to disease in a sea of molecular data

Uma Chandran lives at the hub of bioinformatics at Pitt. She directs two resources - the Cancer Bioinformatics Services (CBS) and the Genomics Analysis Core (GAC) – that connect a vast range of researchers with computational support and expertise. The Center for Research Computing is a key collaborator.

Genomics can be important in many fields, but its signature role is helping understand the molecular basis for diseases. Chandran, associate professor in biomedical informatics, coordinates resources that support dozens of genomic projects involving comparisons of normal and diseased cells in rare disorders, HIV, and several cancers -- lung, prostate, and hand and neck, among others. Her own work includes studies of gene expressions in prostate cancer, pediatric vaccine reactions, and breast cancer.

A 2020 project sought to identify genes regulated by glucocorticoid steroid hormones. The glucocorticoid drug dexamethasone is commonly used for women at risk for preterm birth and reduces the risk of death and lung-related comorbidities for newborns. But prenatal exposure to excess glucocorticoids can lead to neural alterations that contribute to behavioral and cognitive impairments in some children. Chandran, along with CRC’s Fangging Mu, graduate student Kimberly Berry, and professor of pharmacology and neuroscience Donald DeFranco, performed complex bioinformatics analysis using two data sequencing processes (technically, ATAC-seq – Assay for Transposase Accessible Chromatin with high-throughput sequencing – and RNA sequencing) to examine the effects of dexamethasone on neural stem cells in a mouse model in order to identify possible alternative therapies. The paper was presented at the Annual Meeting of the Endocrine Society (the abstract was published in May in the Journal of the Endocrine Society and the full manuscript will be published in the fall).

Genomics’ potential insights and discoveries swim in an ocean of unanalyzed data. “Researchers have data from their own labs, but they don’t know what to do with the data,” explains Chandran. “That’s when they come to us.”

Chandran has been a champion of bioinformatics at Pitt for over 15 years but did not plan to be.

“I was not a computer person at all,” Chandran explains. “I was trained as a biologist. We did experiments, observed, and took notes. But when researchers began to leverage the human genome data and second-generation sequencing started to take off in the 2000’s, I knew a lot of data would be coming out.”

Chandran taught herself. “I had no formal training – there was no formal bioinformatics training for biologists. Before that time, biologists were biologists and programmers belonged to an entirely discipline altogether.” Not wanting new generations of biology students to need to learn on their own, particularly since bioinformatics has become very complex in the last decade, she strongly advocates for formal coursework in bioinformatics for all students in biological and health sciences.

Fangping Mu at CRC agrees on the need for computational expertise. A CRC consultant and research assistant professor, Mu collaborates frequently with Chandran, who serves on CRC’s Advisory Committee.

“The primary problem with the explosion of biomedical datasets is not the data itself, not computational resources, and not the required storage space, but the general lack of trained and skilled researchers, who understand both genomics and computation," explains Mu.

With a background in both genomics and computation, Mu helps researchers use the more than 240 software packages installed on CRC’s high-throughput computing cluster, which is designed to support genomics and bioinformatics. The crucial feature of high-throughput computing for biomedical research is the ability to rapidly move data through a pipeline – a workflow of analysis programs – and to run many different pipelines in parallel. Among other steps, Mu must balance the infrastructure to coordinate the speed and flow of sequencing data of the target genome, sequencing data of a reference genome, and the algorithms of the computational tools.

Mu resolves complex issues to create coherent analyses. Software and data come from varied sources in varied forms; sometimes the software is of questionable origin. Many biology researchers are accustomed to using software based on familiar graphic user interfaces and not the command line programming necessary to develop pipelines. Mu creates pipelines and also spends much of his energy tracking, managing, and debugging software.

CRC contributes significantly to bioinformatics education at Pitt. Mu conducts and organizes free workshops offered through CRC every semester that offer training to  the Pitt research community from Pitt health science researchers working in cutting-edge bioinformatics and sequencing analysis, as well as presentations from industry representatives. The Fall 2020 workshops begin Sept. 15 with a presentation by Mu on using R language for Genomics, several workshops on sequencing analysis, including a presentation by Chandran.

Brian Connelly