Q&A: Machine Learning and Explainable AI in Credit Risk
Q&A: Machine Learning and Explainable AI in Credit Risk
Ground-breaking modeling techniques fueled by machine learning and explainable artificial intelligence (xAI) are transforming the credit decisioning landscape. They are helping more consumers gain access to the credit they deserve so they can live their financial best. Likewise, they are enabling businesses to accept more creditworthy customers while holding their portfolio risk tolerance constant. As a result, businesses are expanding their portfolios and driving more profitable customer relationships.
However, these new machine learning modeling techniques are different from traditional methods and require a level of understanding. I interview Chris Yasko, Vice President of the Equifax Data Science Lab, to better understand how machine learning and xAI relate to credit decisioning.
Machine Learning 101
Chris, thanks for taking the time to answer these questions. Let’s start with the basics. What is machine learning?
Machine learning (ML) is a class of models that make classification predictions based on implicit learnings from the data. These ML models are commonly used to predict future events based on current data. ML is a subclass of artificial intelligence (AI). There are different types of machine learning. Neural networks, random forests and gradient boosting machines are popular techniques commonly tested in prediction problems.
Is one type of machine learning better than the other?
In theory, no. Neural networks, random forests and gradient boosting machines are all “universal approximators”, meaning they can accurately model continuous functions. In practice, every model is an oversimplification of reality and never completely accurate. Therefore, all three can be (and should be) tested to compare performance on a specific data set. Properly tuned, the performance of all three machine learning techniques is usually comparable. By contrast, traditional logistic regression does not satisfy the “universal approximator” property.
What are the dangers of using machine learning?
There are several things to keep in mind when incorporating machine learning into your business:
- Explainability: Credit risk use cases, particularly in the United States, require that models be explainable and actionable. Not all machine learning (ML) techniques fulfill this requirement.
- Overfitting: ML models are very susceptible to overfitting when the ML model learns spurious relationships and noisy trends in the development data that do not hold up over time.
- Disparate Impact: ML models are at risk of causing disparate impact by learning proxies to protected classes and prohibited basis. Lack of transparency makes detection difficult.
What is overfitting?
Overfitting a machine learning model over-optimizes the model to the development data. The development sample performance measurement is excellent, but real-world prediction is poor. With an overfit model, the statistical performance (Kolmogorov–Smirnov or Gini) drops off dramatically over time, hence poor model stability.
Explainable AI Basics
What is Explainable AI (xAI)?
It allows for machine learning models to explain what is happening inside the “black box.” Regulatory and business requirements often necessitate explanations of key factors driving model predictions.
Are there different types of xAI?
Yes. The origin of xAI goes back to the 1970’s with causal-reasoning, rule-based and logic-based systems. As computing power has improved, more complex artificial intelligence techniques have become popular, though many of these are not transparent and require new innovations to fulfill explainability requirements. However, neural networks, gradient boosting machines and random forests are types of ML models that can be made explainable.
What are common mistakes when using xAI?
The credit risk industry is focused on generating explanations post-model development. Logical actionable structure is not built into all types of ML model development. Therefore, consumers are at risk of harm, even though these ML models appear to be explainable.
How Businesses Use Machine Learning and xAI
Why should a business consider using machine learning in its modeling?
Unlike logistic regression, machine learning automatically learns non-linear patterns and interaction relationships in the data. This leads to more accurate models that make better predictions, allowing customers to make better approve/decline decisions, more accurate risk-based pricing or better account management decisions.
Which type of machine learning would work best in businesses?
The best way to find out is to test for performance increases, but there are also considerations for development time and your production environment. Random forests and gradient boosting machines are typically easier to develop. They require a smaller number of hyperparameters to be tuned by the model developer. Neural networks are typically easier to implement in production (scorecard of scorecards) and run faster in production than gradient boosting machines.
How much better will machine learning models perform?
It depends. Model performance is a confluence of multiple factors including the modeling technique, the data used, and the skill of the modeler. Performance lift also depends on the use case. Risk prediction using data from the credit file is decades-old technology that is well understood by subject matter experts who rely on logistic regression methods tailored and tuned to extract significant performance from the data. If you see a 20 percent lift on credit data, or the promise of a 20 percent lift, be skeptical.
Where might a business see the best improvements in its portfolio?
Machine learning often provides superior lift in the tails of the distribution, meaning in scores near the bottom or top. These are often valuable use cases for businesses that require security deposits for low scorers, or make bundling offers to high scorers.
Will using machine learning limit the types of data a business can use?
No. Machine learning techniques can be used with any kind of data. While credit data is most commonly used by Equifax and our customers, alternative data combined with machine learning also holds a lot of promise in the US. With that said, credit data in many countries is not as mature as the US. There is an opportunity to apply machine learning to extract performance improvements in combination with data enhancement and maturity.
How Regulatory Compliance Impacts Machine Learning in Credit Risk
What is the Fair Credit Reporting Act (FCRA)?
The Fair Credit Reporting Act (FCRA) governs the collection, assembly and use of consumer report information in the United States. Originally passed in 1970, the U.S. Federal Trade Commission (FTC) and the Consumer Financial Protection Bureau (CFPB) enforce it. Of particular interest in the review is the assignment of “key factors.” They are required, in many circumstances, to be provided by a consumer reporting agency to the user of a provided score or to a consumer who requests a score. Under the FCRA Section 609(f)(2)(B), key factors mean “all relevant elements or reasons adversely affecting the credit score for the particular individual, listed in the order of their importance based on their effect on the credit score.” This information should be of value to a consumer, providing guidance on what areas he/she could work to improve to achieve a higher score, and with that, a better chance of approval or better credit pricing.
What is the Equal Credit Opportunity Act (ECOA) and Regulation B?
Under the Equal Credit Opportunity Act (ECOA, 1974) of the Consumer Credit Protection Act (1968), the Federal Reserve Board was responsible for drafting and interpreting the implementing regulation. Today, the Consumer Financial Protection Bureau (CFPB) pursuant to the ECOA issues “Regulation B.” Regulation B prohibits creditors from discriminating the creditworthiness of applicants with regard to their race, color, religion, national origin, sex, marital status or age. Regulation B applies to lending activities that take place within the U.S., whether or not the applicant is a citizen.
Are explanations alone enough to satisfy these regulations?
No, explanations alone do not protect a consumer from harm. Particularly in reference to FCRA, Equifax believes that the “key factors” provided to a consumer as explanations of his/her score should allow that consumer to improve that score as a direct result of action taken to correct those key factors.
How can an incorrect explanation harm the consumer?
As an example, a common variable included in credit scores is the “total balance on revolving cards.” This variable is also very noisy, with the observed default rate fluctuating back and forth across a general upward trend in the default rate as the total balance increases. An ML model will accurately capture these up and down trends, making the model more accurate than regression. However, with an incorrectly designed ML-based credit score, there will be consumers who decrease their total balance on revolving cards but observe ML-based credit score decrease. This is an example of a harmful action to the consumer who acted logically.
Is there a difference between an xAI model and the xAI explanations that come from it?
Yes, big difference. Models and explanations are not the same. They are independent. Typically, an ML model is developed to make a prediction, and an explanation technique is then selected to generate explanations. Explanation techniques can be applied to any ML model.
What are some types of xAI explanations?
Points Below Max & Individual Conditional Explanations (ICE), Partial Dependence Plots (PDP), Local Interpretable Model-Agnostic Explanations (LIME) and Shapley values (SHAP) are popular types of explanation techniques applied to ML models.
Do regulators specify what type of xAI can be used?
No. The Federal Reserve has provided guidance that “Regulation B neither requires nor endorses any particular method of credit analysis.”
Innovations that Leverage Machine Learning and xAI in the Credit Risk Industry
What has Equifax done to improve xAI?
Equifax has multiple patents issued in the U.S. and patents pending internationally for NeuroDecision® Technology (NDT). NeuroDecision is a regulatory-compliant neural network technology for risk decision applications. We use it in predictive models that require reason codes. Equifax also has patents pending using other machine learning techniques.
Why did Equifax focus on neural networks for xAI?
Neural networks are “universal approximators” (improved accuracy) that are easy to implement in production (scorecard of scorecards) and fast to run in production.
How does a NeuroDecision model work?
Just like traditional logistic regression models, only better. We input data forms attributes, and attributes into a NeuroDecision model. The NeuroDecision model outputs a three-digit numerical score, four reason codes, plus a fifth inquiry reason code.
How does NeuroDecision work better than logistic regression?
Two reasons. We can put more input variables in the NDT model to improve performance, and use non-linear interactions between variables in the NDT model for more predictive power. Take a tour of a sample XAI model.
Do unconstrained neural networks generate more predictive performance than constrained neural networks?
This is important, and often confusing. An unconstrained neural network may statistically perform better with the model training sample (show higher KS or Gini). However, the unconstrained neural network may overfit the data leading to a false sense of performance. A constrained neural network has shown a better stability over time, hence better statistical predictive performance in production.
How is NeuroDecision Technology different?
We build structure into the model so that explanations are logical and actionable. If a consumer takes logical action on a returned explanation, then their score will always improve. How? By monotonically constraining a neural network to enforce expected trends—it always rewards positive behavior and always penalizes negative behavior. The result: NDT is a monotonically constrained, feed-forward, fully connected neural network that is fully explainable.
Can you prove NDT is better than traditional methods?
Yes, look at the data. NeuroDecision is a superior discriminator when we compare it to a traditional logistic regression model. In every instance, a properly tuned NDT model will provide greater than or equal to zero-percent lift on traditional methods. Read our white paper for real life examples.
Does NDT satisfy the regulations?
Yes. For the ECOA regulators, it is important to note NeuroDecision uses one model (the same neural network) to generate both the risk score and the adverse action codes. It always rewards positive consumer behavior; it always penalizes negative consumer behavior. Logical consumer action in response to returned model explanations always result in a score improvement.
Other Methods and Current Regulations
What are the relevant xAI regulations?
In the U.S., one must adhere to the Consumer Credit Protection Act (1968), Fair Credit Reporting Act (1970), Equal Credit Opportunity Act (1974), and the corresponding amendments. The Fair Credit Reporting Act (FCRA) Section 609(f)(2)(B) states: “… all relevant elements or reasons adversely affecting the credit score for the particular individual, listed in the order of their importance based on their effect on the credit score.”
The Equal Credit Opportunity Act (ECOA), as implemented by Regulation B (Reg B), states in the Supplement, Official Interpretations, Section 1002.9 - Notifications, Paragraph 9(b)(2)(4.): “ If a creditor bases the denial or other adverse action on a credit scoring system, the reasons disclosed must relate only to those factors actually scored in the system.”
Does NDT satisfy the regulations?
Yes. For the FCRA, the NDT reasons that adversely affect the credit score are listed for the particular individual, in the order of their importance based on their effect on the credit score. For the ECOA Reg B, NDT adverse action reasons relate only to those factors actually scored in the system. To meet FCRA, NDT uses the explanation method called Individual Conditional Expectation (ICE) plots to enable you to drill down to individual observations. Using a constrained ICE technique, positive consumer behavior is always rewarded, and negative consumer behavior is always penalized. ICE has been applied to traditional methods (logistic regression) as “Points Below Maximum” for many years, and NDT uses the same. To meet ECOA Reg B, NDT uses one model (the same neural network) to generate both the risk score and the adverse action reasons.
Does xAI “by proxy” explanation satisfy the regulations?
No. An explanation “by proxy” uses two models: one model (e.g. neural network) to generate the score, but another separate “proxy” model (usually regression) to generate the adverse action reasons. The ECOA Reg B is very clear: “... the reasons disclosed must relate only to those factors actually scored in the system.”
Does xAI “Partial Dependence Plot” explanation satisfy the regulations?
No. PDPs explain at the portfolio level, not the individual level. The FCRA clearly states “... adversely affecting the credit score for the particular individual.” Partial Dependence Plots (PDP) represent the average prediction for an attribute value if we assume all observations in the data set have that feature value. PDPs are useful for model diagnostics, but should not be used for individual adverse action codes.
Does xAI “LIME” explanation satisfy the regulations?
No. Local Interpretable Model-agnostic Explanations (LIME) is a modified proxy model technique to generate local explanations “in the neighborhood” of the prediction we want to explain. Therefore, LIME does not explain the model globally. First and foremost, LIME generates explanations from a proxy model, and as previously stated, proxy explanations do not meet ECOA Reg B.
Does xAI “SHAP” (Shapley Values) explanation satisfy the regulations?
Yes and no. Yes, if Shapley Values are applied to logical and actionable (constrained) models. Then, the Shapley Value model explanations are logical and actionable. No, if Shapley Values are applied to an unconstrained model. Then, as SHAP explanations may harm the consumer with incorrect adverse action.