Deep Learning to Predict Default

                                                                   Stefania Albanesi

Two years ago Pitt economist Stefania Albanesi co-authored a paper that garnered a lot of attention. "Credit Growth and the Financial Crisis: A New Narrative" used a large data set on debt and credit scores to refute conventional wisdom about the 2007-2009 global credit crisis – that the crisis was triggered by mortgage defaults by US subprime borrowers with low credit scores. What Albanesi found in the data was that borrowers with higher credit scores accounted for an outsized percentage of mortgage defaults. Computation supporting the paper was done at the Center for Research Computing.

This paper was widely read and quoted. Albanesi, who came to Pitt from the New York Federal Reserve, has presented at conferences internationally; she is regularly quoted in the media as an expert on debt and default.

“A New Narrative” posited an historical re-interpretation based on a straightforward conclusion – credit scores did not accurately predict the rate of mortgage default in 2007-2009. Thinking beyond history, Albanesi and graduate student Domonkos Vamossy asked two questions – Why didn’t credit scores predict default particularly well? Can one develop new models of credit scores that better predict default?

Albanesi and Vamossy offer a path toward a new type of credit score in “Predicting Consumer Default: A Deep Learning Approach,” a working paper published in August at the National Bureau of Economic Research.

The paper presents a deep learning model to predict consumer default that outperforms standard credit scoring models in accuracy while relying on the same data. Albanesi and Vamossy argue that the new model provides policy makers insights into policies targeted at reducing default and alleviating the burden on borrowers and lenders – as well as offering guidance on the kind of large-scale risks that brought on the credit crisis.

“So little work has been done to see if the assumptions of the credit scores were correct,” explains Albanesi. “So much in people’s lives now is dependent on credit scores – even getting a job or renting an apartment – and we don’t know what is in the model. Few researchers have the data. Only two companies – FICO and VantageScore – develop the scoring models but they are not obligated by law to reveal more than four determinants of the score.”

“We asked if we can build a model with meaningful probabilities, not just a score, that is better predictive of individual risk, and also risk in the aggregate.”

Albanesi and Vamossy sought to create a score that can better predict default risk and benefit both borrowers and lenders. Some consumers could score better than they do under the existing system. Lenders will be able to make loans with a lower rate of default. “The customer for the credit score is the lender,” Albanesi emphasizes. “They want more accurate forecasting.”

Albanesi and Vamossy turned again to CRC, this time using Graphics Processing Unit (GPU) resources to  develop machine learning models to compute an expanded range of thousands of variables in the credit scores.

“With the parallel capability at the center, we could send multiple codes to multiple nodes simultaneously, training different models on different nodes,” Vamossy explains. “It is much better use of computing time. The GPU cards are 10 to 20 times faster than standard computing cores. Our largest model trained in one day: it would have run for a week on a machine not using GPUs. The GPUs also have larger memory.”

Albanesi adds “We cannot do it without the high-performance cluster. We use a hybrid model that had to run many alternative models and without the GPUs it would be too costly. CRC at that scale is just right for us.”

Several findings in the paper question standard practice. One factor negatively affecting credit scores in standard models is utilization – the more a borrower uses credit, the worse the credit score. The new model suggests that in predicting the likelihood of default, utilization is not as important as total amount owed. Total amount owed is likewise a better predictor than payment history.

“There is a very large difference between our AI credit model and the conventional credit score,” Vamossy explains. “Our agenda is to use new tools with the rigor of economics to highlight the advantages associated with new technology. The new model has very high predictive power. More testing should be done to continue to update the model, exploring implicit biases and other potential issues.”

Commercial lenders are interested in the new credit score model for individual borrowers. The new model could also provide insight to policy makers about aggregate, systemic risk. Albanesi presented the findings of the paper at the European Central Bank in July. “They were super interested. They saw the potential for macroeconomic risk assessment.”

 “We want to bring in more factors than the credit raters now use, and with more transparency,” Albanesi says. “We have created one new model – maybe two competing models could lead to better models. We want to develop the best prediction possible using the best technology available.” 


Read “Predicting Consumer Default: A Deep Learning Approach” at


Brian Connelly
Pitt Center for Research Computing

Friday, September 4, 2020