Post 2020, financial inclusion is the common theme
This week, top data and analytics experts from Equifax and around the world will discuss credit scoring and related topics at the prestigious biennial conference, Credit Scoring and Credit Control XVII. Equifax teams from across the globe will participate by presenting 14 papers, more than any other organization in attendance.
Following 2020, a year unlike any other, our data and analytics group has been hard at work researching new data sources, AI-fueled technologies, modeling techniques, and analytic approaches. This will financially empower businesses and consumers in a pandemic-influenced market that’s moving at breakneck speed. In these papers, audiences will discover how Equifax analytic scholars and professionals from around the world are thinking differently and challenging the status quo by exploring smarter ways of remediating model bias, predicting default probability, delivering inclusive yet explainable scores, reducing customer churn, and more.
To clarify, we’re on a mission to help banks and lenders confidently serve a wider audience of consumers— particularly those who are underbanked and underserved—by giving them greater access to the financial products they need to live their best life. Keep reading for a quick summary of the latest research papers scheduled to be presented at the conference, or you can view past papers here.
Wednesday, August 25, 2021
1. Frameworks for Testing Model BIAS
Presenters: Marcus Bruhn, General Manager Data Science at Equifax, Swathi Veeravelly, Head of Models and Attributes Country: Australia/New Zealand
Summary: Australian legislation prohibits discrimination on the basis of race, color, sex, religion, age, political opinion, and other factors. This presentation will consider model BIAS where outcomes are systematically less favorable to individuals within a particular group and there is no relevant difference between groups that justifies such harm. When assessing model bias, a sample of credit applications scored with the model, target outcome and the protected attributes are collected, with protected attributes including age, gender, religion, country of birth, etc. Next, we explore how a process was developed to pick up potential signals of indirect bias in the models so analysts can review, explain and remediate when necessary, and share how this was approached for our recent score model developments in Australia.
2. Assessing Disparate Impact for Machine Learning Credit Scoring Models: The Available Data Do Not Meet Analytical Requirements
Presenter: Terry Woodford, Senior Data Scientist at Equifax Country: USA
Summary: Generally, in lending, disparate impact is defined as a situation in which a lender applies a policy or practice equally to all applicants, but it results in a disproportionate adverse impact on applicants from a protected class or prohibited basis. However, in the context of credit scoring models disparate impact means that an attribute’s predictive power in a model does not predict performance within a protected class, but rather it serves as an instrument for the class. This effect has been evaluated in the context of logistic regression, but not in the context of AI and machine learning credit scoring models.
This paper summarizes prior research and evaluates the data required to assess disparate impact in machine learning credit risk models. The demographic ethnicity data used in the original analyses are commercially unavailable. Since the original analyses were reported in 2007 and 2010, proxy measures for certain protected classes have been publicized. Specifically in this paper, we examine the BISG proxy measure for ethnicity and compare it to commercially available self-reported demographic data using a ~300 million record sample.
3. IFRS9 Probability of Default and Economic Forecasts
Presenters: Monika Leng, Data Scientist at Equifax UK, Vassilis Ioannou, Head of Data Science at Equifax UK Country: United Kingdom
Summary: The estimation of expected losses under IFRS9, an International Financial Reporting Standard published by the International Accounting Standards Board, should be unbiased, relying on reasonable and supportable information and assessing credit risk conditional to current and future economic conditions. The current practice in the UK is to use multiple scenarios for future economic conditions, starting from externally available economic forecasts or those created by internal economic teams and simulation processes. The impact of the economic forecasts used in the estimation process to the final impairment estimation varies with the specific methodology employed by each bank; however, the overall magnitude was recently underlined by the abrupt increase in credit loss provisions of many major banks at the eve of the COVID-19 crisis.
This study initially focuses on key economic variables commonly used by most banks to capture future economic conditions, and the performance of publicly available forecasts available historically, including around times of crisis. It then attempts to estimate the magnitude of the impact that the forecasting accuracy of predicted economic conditions may have on the forecasting accuracy of the models used to estimate probability of default under IFRS9.
Thursday, August 26, 2021
4. Rumble in the Decision Engine: Credit versus Bank Account Information
Presenters: Fiona Gatchell, Head of Transactional Analytics at Equifax, Harvey Lawrence, Lead Analyst at AccountScore, an Equifax company Country: United Kingdom
Summary: With the emergence of bank account information in credit decisioning, what are the most predictive data types in different credit risk models? This paper explores multiple aspects of the relationship between credit and bank account information, covering topics such as:
- Where, when and how can bank account information data outperform traditional credit data?
- Is there any statistical evidence that bank account information can identify signs of financial difficulty ahead of traditional bureau data?
- How long does it take a consumer to develop a robust credit score? Using traditional data and using bank account information?
- What are the features of a bank account score? How stable is a bank account information score?
- Global scoring - can bank account information scores cross borders?
5. The Impact of Income Shocks to Probability of Default Estimates
Presenters: Vassilis Ioannou, Head of Data Science at Equifax UK, Raffaella Calabrese, Senior Lecturer in Data Science at Business School University of Edinburgh Country: United Kingdom
Summary: Credit scores have long been used for ranking the risk of loans to support credit strategies and as an input to probability of default estimates for loss provisioning purposes. However, the ascent of additional data sources gives lenders the opportunity to improve their risk estimates by quantifying important information that becomes available either intermittently or only on occasion. One such case is when an income shock is identified and lenders require revised risk estimates to better support their clients under schemes such as the payment holidays implemented during the recent crisis, and also update their loss provisioning estimates. To conclude, this presentation will share an approach that can be used to estimate updated probabilities of default, conditional to traditional credit scores, variables, and the identified income-shock event.
6. "Surprise, We're Bankrupt" - Finding Short-Term Unexpected Bankruptcies
Presenters: Ramesh Sankaranrayanan, Manager Advanced Analytics at Equifax Canada Country: Canada
Summary: Prior to the COVID-19 pandemic, the Canadian credit market was undergoing significant change. Debt-to-income levels were stuck at historically high levels after a prolonged period of credit growth. Government regulations had slowed the mortgage market in recent years, but non-mortgage products remained strong. Lines of credit, particularly home equity lines (HELOC), represented 45% of the non-mortgage debt outstanding. As interest rates started rising in 2017, those carrying large balances on lines of credit felt the strain given the use of variable rates.
These conditions drove a noteworthy shift in the market, ultimately leading to a breakdown in the connection between bankruptcies and delinquencies that caught many lenders off guard. The existing bankruptcy scores still worked well. However, there was a substantial portion being missed and traditional loss models were not accounting for these losses effectively. After a number of discussions with lenders, Equifax started to model early indicators for these “surprise” bankruptcies, developing a short-term insolvency score complementary to existing models. This presentation explores the early trends that drove the need and outlines the key data inputs and modelling used to identify the surprise bankruptcies. While this was a Canadian event, the implications of the surprise bankruptcy model are applicable to many markets with high levels of indebtedness.
7. Comparative Analysis of Machine Learning Credit Risk Model Interpretability: Model Explanations, Reasons for Denial and Routes for Score Improvements
Presenters: Dr. Michael McBurnett, Distinguished Scientist at Equifax; Dr. Federico Sembolini, Lead Data Scientist at Equifax; Dr. Matthew Turner, Fellow & Distinguished Mathematical Statistician at Equifax; Dr. Lewis Jordan, Senior Mathematical Statistician at Equifax; Dr. Howard Hamilton, Senior Data Scientist at Equifax; and Dr. Sergio Rodríguez Torres, Data Scientist at Equifax Country: USA, Spain
Summary: This presentation examines machine learning credit risk model explanations in the context of constrained and unconstrained model construction methods and regulatory requirements in the United States and Europe. In the United States, credit risk model explanations are used to inform consumers why they were denied credit, to alert consumers as to problematic data on their credit file, and to assess fair lending. However, in Europe requirements are more stringent. Recent (2020, Section 53) European Banking Authority Guidelines on loan origination and monitoring state that the principle of explicability of algorithms is critical, which requires the ability to interpret the model completely.
Four risk modeling cases are explored in detail, and it is demonstrated that the unconstrained models produce nonsensical explanations for consumers. Conversely, the monotonically constrained machine learning models produce logical and actionable information for consumers they can use to understand how they received the assigned score, why credit was denied, and to improve the credit score. In addition, the constraint also produces explanations useful for modelers, risk managers, and regulatory reviewers who need to understand the internal workings of these models.
8. Actionable and Feasible Consumer Credit Score Improvement Paths with Optimality Constraints and Explanations
Presenters: Dr. Matthew Turner, Fellow & Distinguished Mathematical Statistician at Equifax; Dr. Lewis Jordan, Senior Mathematical Statistician at Equifax; Dr. Stephen Miller, Principal Data Scientist at Equifax Country: USA
Summary: Credit score simulators have historically been trial-and-error based processes, requiring the consumer to experiment with various “what if” scenarios to evaluate the impact a set of actions may have on their credit score. For example, what if I apply for a new car loan, what if I decrease my credit card balances, or what if I remove a bankruptcy? However, a consumer may be interested in improving their score to a specific value in order to reach a minimum approval score threshold, or to qualify for a better pricing offer.
In this presentation, we describe a method for constructing an optimal path that explicitly navigates an individual consumer through the model feature space from their current score to a score of their choosing over monthly intervals. Optimality is achieved by minimizing the distance the consumer must travel within the feature space, subject to a given score constraint. Boundary box constraints are used to ensure the path is constrained within the attribute domain. Also, the algorithm provides the consumer with explanations of what model features most impacted their score improvement by utilizing integrated gradients.
Friday, August 27, 2021
9. Self-Organising Maps for Multiple Scores
Presenters: Marcus Bruhn, General Manager Data Science at Equifax, Warren DuPreez, Data Scientist Country: Australia/New Zealand
Summary: In this presentation, self-organizing maps (SOMs) are explained in contrast to the score-of-scores and cut-off matrices that are widely used in credit decisioning when multiple scores are to be used together (e.g., a bureau score and a bespoke application score). While the score-of-scores approach may allow for more efficient strategy due to greater discrimination ability, self-organizing maps (and cut-off matrices) allow for flexible strategy development with respect to component scores, as well as useful visualizations of a given customer base in terms of risk profile. Acknowledging this trade-off, we also propose a proxy for the GINI metric that can be calculated for self-organizing maps. This enables an understanding of the price in discrimination ability that must be paid in order to gain the benefits of the self-organizing maps approach.
10. Using Credit Bureau Data and Machine Learning Algorithms to Predict Churn Scores in the Insurance Industry
Presenters: Dr. Federico Sembolini, Lead Data Scientist at Equifax Iberia; Cesar David Iglesias Perez, Analytics and Data Science Manager at Equifax Madrid; Francisco Ruiz, Data Scientist at Equifax Iberia; Dr. Sergio Rodríguez Torres, Data Scientist at Equifax Iberia; Juan Antonio Roldan, Data Scientist at Equifax Madrid Country: Spain
Summary: Customer churn is a common problem that most companies need to address, but it is particularly important in the insurance industry, as customers usually renew their contract on a yearly basis, and they are therefore more likely to consider alternatives with more favorable conditions. At Equifax, we have developed a score model based purely on credit bureau information and socio-demographic variables to predict customer churn in the Insurance industry. The algorithm uses machine learning, namely XGBoost - and we show how the use of this technique can improve the results of algorithms traditionally used in credit risk models, such as logistic regression. This presentation focuses on three main points: how we developed a churn model for the Insurance industry using credit bureau data; how we used micro-aggregation to ensure statistical anonymization; and how we compared the output of XGBoost to that of logistic regression and found the former to be more predictive.
11. Protecting the Foundation - Retaining Your Mortgage Portfolio
Presenter: Bill Johnston, Vice President of Data & Analytics at Equifax Canada Country: Canada
Summary: The Canadian housing market has been on a rollercoaster. After a significant drop in activity in recent years, the COVID-19 pandemic reignited the market and has lenders concerned about rapid price acceleration. Further, in Canada, 20% of a mortgage portfolio renews in any given year, which presents a significant risk to lenders during a competitive market. The risk of lenders poaching from each other was the primary reason most refused to report their mortgages to the credit bureau and why they limit the information available.
This presentation describes how Equifax partnered with lenders to build a predictive model to rank the likelihood of an account changing to another bank. This includes the use of mortgage inquiry data and previous buying channels. The models leverage machine learning capabilities to identify those accounts most at risk for attrition and can account for end-of-term renewals and in-term refinancing. Then, we will explain how lenders can leverage the scores to manage their renewal process, contact the riskiest customers early to minimize potential competitive offers, better target their marketing investment, and strengthen their portfolio by reducing leakage.
12. Credit Risk Models for Unbanked People - Latin America
Presenter: Matias Karmelic, Latin American Analytics Director at Equifax Region: Latin America
Summary: One of the biggest challenges of the financial industry is getting people who are unbanked into the system. At Equifax, we have made great efforts to improve the performance of our models for the unbanked, which in many Latin American countries represents almost half of the population.
This presentation shares how Equifax has achieved an average 20% improvement in the predictive capacity of our models, which is enabling our financial customers to securely serve a wider audience of underserved and underbanked populations. Specifically, we’ll explain considerable enhancements made to the depth, breadth, and quality of the data used, including the addition of exciting new sources that comprise geographic information, socioeconomic conditions, and familial environments. We will also discuss the use of new algorithms, such as RandomForest, XGBoost, or LightGBM, that allow us to increase the predictive capacity, facilitating better interaction between the variables.
13. On the Development of Repeated Measure Models for Predictive Credit Scoring
Presenters: Dr. Howard Hamilton, Senior Data Scientist at Equifax; Dr. Jeff Dugger, Principal Data Scientist at Equifax Country: USA
Summary: Consumer credit risk models have been built from credit attributes measured at a single instant of time, yet credit bureaus maintain credit data archives for a given consumer that go back many months or years.
In this presentation, we propose a risk model based on repeated measures of data; that is, consumer credit behaviors that have been measured at a minimum of two instants in time. We share how to formulate a convolutional neural network model that combines feature learning with the feature classification of a logistic regression model to create consumer credit scores, as an alternative to feature engineering, which requires careful consideration by subject matter experts. We apply L1 regularization to the logistic regression classifier to determine learned features that are important to the model output. Then, we demonstrate the approach on credit scoring of revolving debt in the U.S. consumer population and show that L1 regularization reduces the complexity of the model by an order of magnitude without degrading model performance.
14. Can Your Customer Afford Additional Credit? Understanding Consumer Capacity
Presenter: Rebecca Oakes, AVP Advanced Analytics at Equifax Canada Country: Canada
Summary: A surge in government benefits has pushed disposable income and savings higher, which has allowed debt-to-income measures to fall from historically elevated levels. Despite that improvement, many Canadian consumers remain heavily indebted and regulators are turning their attention to assessing a consumer’s ability to manage any additional debt commitments. This will be increasingly important as historically low interest rates have delivered a red-hot housing market that is driving significant growth in mortgage balances with many likely to face higher borrowing costs when it comes time to renew in the next five years.
The presentation focuses on the definition of capacity and how the estimation tools were able to effectively identify individuals with real world examples of the application of the analysis. During the presentation, we will share how several modeling techniques were tested and demonstrate the applicability of the data to measuring capacity including, the holistic use of credit bureau data with lender-specific data, credit card payment data, falling payment rates, and long-term trended data.