Computational tools to enlighten dark areas of cancer genetics

                                            Xiaosong Wang

In biomedical research, integrated computational and wet-lab teams offer possibilities to quickly test hypotheses, interpret data, and find mistakes. The interchange calls for the kind of flexible, responsive, and powerful computing and consulting resources offered by the Center for Research Computing.

Xiaosong Wang, associate professor in pathology and biomedical Informatics, leads the Computational Genomics and Translational Cancer Biology lab in the Pitt Cancer Institute. This unified computational and wet laboratory explores cancer genomics using next generation sequencing and genome profiling in a multidisciplinary approach uniting researchers in bioinformatics, genetics, and molecular and cell biology. The lab’s guiding principle is translational “bench to bedside” research – transforming genomic data into precision medicine to battle cancers, particularly breast cancer.

The lab targets what Wang describes as underexplored aspects of the cancer genome – cryptic structural mutations, mutations that are not visible until combined with other mutations.

“These are dark areas of breast cancer,” says Wang. “There are very few genetic biomarkers for more aggressive breast cancers. We are looking at the structural aspect, analyzing data at the whole genome sequencing level and analyzing the structural mutations in the sequence.”

The work relies heavily on compute power and support from CRC.

“In our research, we build our own computational tools that require quick feedback from the CRC support team to debug.” explains Wang, who sits on CRC’s advisory committee. “With the outstanding support from CRC, we get a response within a few hours for the tickets that might otherwise take days in other supercomputing resources we have used. CRC is a fantastic resource.”

Genomic sequencing data exists in multiple molecular levels that must be fused together to create a valuable analysis. The Wang lab combines data found in databases of RNA and genome sequencing, DNA copy numbers, gene expressions, and functional gene set signatures (characteristic patterns of DNA aberrations, or pathways in the genome), among others. The lab develops computational methods to integrate multidimensional genomic data using CRC resources.

Gene set signatures could be redundant because sources follow different rules for nomenclature and categorizing them. To be valuable, a tool must be able to flag redundant data and filter through only unique signatures. Post-doc Xu Chia and graduate student Sanghoon Lee in Wang’s lab helped develop a powerful computational method that brings together comprehensive sets of molecular data to interpret complex changes in gene signatures. The two-stage method substantially improves the possibilities of quantifying novel gene and pathway functions across the entire genome.

In the first stage, an algorithm known as a universal ConSig (uniConSig) analysis pulls selected characteristics associated with a gene from several databases – and repeats the process in thousands of iterations while the gene signatures are compared to signatures in lists of experimental genes. In the second stage, those pathways are analyzed via an algorithm called Concept Signature Enrichment Analysis (CSEA), which quantifies precise functional associations between molecular markers by interpreting their shared pathways. Understanding shared pathways could have wide applications in discovering gene functions underlying diseases.

“To simplify, UniConSig scores the gene by function, then CSEA scores pathways by gene set signature,” Lee explains. “When compared to known methods, CSEA outperforms the Gene Set Enrichment Analysis, which is the widely used method to determine the pathways enriched in a defined experimental gene set.”

The wet laboratory and computational wings of the Wang lab work in tandem. The wet lab sequences tissue samples of tumors, and that raw data goes to Lee along with data from sources like the Cancer Genomic Project at the Wellcome Sanger Institute. He uses CRC’s advanced next generation sequencing resources to standardize the data and analyze it using the unique computational tools developed in the lab.

“So if the wet lab wants to test structural mutations that might be in a patient sample, we look for rearranged genes computationally – to validate the result, the wet lab performs clinical studies on the sample looking for the same mutation, and performs genetic engineering to over-express or deplete the mutation in the cell line.”

Lee relies on CRC’s high-throughput computing cluster to process the daunting volumes of transcriptome (the sequences of messenger RNA) and genomic sequencing data from several large cancer genome projects.

CRC research faculty consultant Fangping Mu works with biomedical informatics researchers processing genomic data using Pitt CRC’s high-throughput computing cluster. “In my lab I can’t run multiple jobs. At CRC I can run jobs for several projects at once. When we have issues, Fangping usually responds in real time.”

“Bioinformatics analyses involve transferring files in parallel through a series of steps, called a pipeline or a workflow,” Mu explains. “Typically, these transformations are done by existing command line software. But very often software packages are not compatible. Pitt CRC installs and configures the analysis packages, but I also spend a lot of time debugging.”

“Fangping is ultra-helpful, ultra-responsive,” says Wang. “He saves us from substantial problems and delays so we can concentrate on this work, which is very new. Some areas in cancer are known, but some are untouched. We need to develop the tools to enlighten that unknown area. We develop those tools with CRC.”