Heng Huang wants to connect brains – both within individual brains and between the collective brains of networks of researchers. Funded by $1.2 million from the National Science Foundation’s Big Data program, Huang leads a group of multi-year projects known as Brain Big Data – a machine learning framework for data mining across multiple research sites. It is one of eight NSF and three NIH projects Huang leads on machine learning, big data mining, computational neuroscience, health informatics, and precision medicine.
Brain Big Data aims to advance research into diagnostic tools for brain disorders based on the concept of connectomes – analyzing gene sequencing, clinical imaging, and biochemical markers together within a unified data set. Brain Big Data is developing the infrastructure of a network that will let researchers at multiple institutions share and mine data. Huang’s team creates machine learning tools to help solve the challenges to multi-site collaboration that are computational, logistical, and even legal – researchers will often need to share protected health information. Collaborating institutions, including university medical schools and hospital systems, are contributing 10 years of collected data to Brain Big Data.
“The tools we are developing are based on similar computation, but use different data,” explains Huang, John A. Jurenko Endowed Professor in Electrical and Computer Engineering and Biomedical Informatics. “For instance, consider genetic data and data from MRI and PET scans. One is imaging data; one is sequencing data. If it were possible to Integrate that data, we could create a tool for early Alzheimer’s prediction, for instance. It would be possible to identify both genetic, biochemical and phenotypical – or physical – biomarkers together in one analysis.”
Combining data formats efficiently is a fundamental challenge. Imagine meshing old videotapes and MP4s into one analyzable format – and then including scanned text from film scripts in the analysis.
Huang’s team attacked the challenge using CRC resources to develop, test, and refine algorithms that blend and process diverse imaging data more accurately and efficiently than existing processes. Graduate students Shangqian Gao and An Xu work with CRC resources to design deep neural network algorithms that compress multiple formats of neuroimage and brain network data.
“The imaging data is multi-modal, meaning the variables are not the same,” explains Gao. “The goal is to analyze the data using one set of variables and then compare to the result using different variables. We need to blend inconsistent variables that represent very heterogenous brain data.”
Gao was the lead author on a paper published in 2019 in the proceedings of the prestigious Computer Vision and Pattern Recognition conference, based on work done with the resources of CRC’s graphics processing units (GPUs), high-memory parallel computing technology first developed for computer games. Gao and team developed two machine learning models based on deep neural networks that greatly reduce the memory and computational cost of combining the video data in one analysis.
CRC presented many advantages.
“We developed the code on a small scale on our own server, but we had no idea if the algorithm could actually handle the necessary volume of data,” Gao relates. “We needed to run the code on CRC. With CRC, we could open more processors and accomplish many tasks in a short time. We had four to five experiments running in parallel and used up almost all of Dr. Huang’s computing allocation.”
Huang reflects on the possibilities of the Brain Big Data projects. “Nobody knows what causes Alzheimer’s. We know co-factors, but not causes. The hope is that these projects will lead to multiple tools for other researchers to Identify potential biomarkers, with data that is widely available to the community. If we had a predictive tool it would make a difference in treatment, and those tools could contribute to precision medicine in cancer and other diseases.”
With machine learning expertise and projected investments in GPUs, CRC resources will continue to be part of Brain Big Data.
“In biomedical research,” says Huang, “the need for GPUs and machine learning will only keep growing.”