Identity & Fraud

What Types of Analytics Intelligence Can Power Identity Trust?

What Types of Analytics Intelligence Can Power Identity Trust?

December 09, 2020 | Sriram Tirunellayi

Consider Performance Criteria, Algorithms and Labeled Data

Fraud prevention is a challenging task, given the mercurial nature of the problem. Last week’s modus operandi may no longer be relevant today since fraudsters continually change and adapt. In my previous blog article, I described the data-to-decisions journey as it pertains to identity and fraud. Now, let’s dive into identity trust solutions that are increasingly powered by advanced analytics like artificial intelligence (AI) and machine learning (ML).

Machine learning methods are best suited for identity trust because the approach has the inherent capability to work with huge volumes of data from multiple and diverse sources. The models can be trained to learn, adapt and detect evolving patterns. When designed properly and trained correctly, machine learning models can continuously learn as new data is presented in the form of feedback outcomes. The models are able to isolate data points that are deviations from known safe patterns and help uncover new fraud patterns.

There are several key considerations in the optimal design of any machine learning model. Here, we will focus on the following three points to illustrate the relevance of AI and ML models for identity and fraud.

The Economics of Identity Trust Decisions

As described in an earlier blog article, identity trust decisions during an interaction can impact the customer experience and influence their perception of the brand and future interactions. In a practical sense, this translates into the following costs: operational cost, fraud losses and opportunity cost. The time and effort required to validate information counts as operational costs. Also, customer friction drives opportunity costs due to false positives or extra authentication steps during sign up.

Given the complex, multi-faceted nature of trust decisions, machine learning models leverage a mix of intelligent data, behavior patterns and signals to make the best identity trust decision for every transaction. That decision is always optimized to minimize fraud losses, while balancing operational costs with opportunity costs.

The Dynamic Nature of Fraud

Fraud is a rare event, and it is not uncommon to see fraud rates of less than 50 basis points. In some situations, fraud labels may be messy, costly and time consuming to obtain. In other situations, there may be only a handful of confirmed cases or anecdotal examples with which you can work. Fraud patterns also vary over time as fraudsters change their attack vectors and adapt to new defenses that institutions deploy. By assessing a continuous feed of data and signals regarding a consumer’s past interaction, present context and predicted intent, which can be intelligently adjusted based on feedback outcomes, machine learning models can seamlessly adapt to these changing patterns.

The Importance of Problem Framing

Machine learning models do not exist in a vacuum. They exist to serve a specific business problem; therefore, an accurate articulation of the business problem is a fundamental requirement toward the correct formulation of the machine learning model. It’s important to answer critical questions upfront so that relevant design parameters such as training population, sampling and weighting schemes, segmentation, and even appropriate algorithms, can be chosen wisely. For example, you might pose the following questions:

  • Are you looking to replace an existing fraud model or augment the model as an additional layer of defense?
  • Would you like to predict behaviors of a fraudster or capture patterns of fraud victims?
  • Are there any biases in the label definition that need to be accounted for?

Distinctive Characteristics of Machine Learning Models for Fraud

These unique design considerations further dictate how machine learning for identity and fraud is different from other use cases.

Performance Criteria

Standard supervised models are evaluated against precision and recall. Due to the economics of fraud, it is important that fraud models not only have high precision (low false positives) and high recall (high fraud capture rate). They should also occur within a fairly small fraction of the population. Simply, the population that can be alerted for additional fraud review has to be well within the operational constraints. This is dictated by several factors:

  • impact to customer future value
  • number of investigators
  • time taken per investigation
  • cost of investigation

Thus, compared to credit risk or marketing models where performance metrics like Kolmogorov–Smirnov (KS) and GINI statistics evaluate model performance across the entire distribution of the scored data, fraud models are judged based on false positives (precision) and fraud capture rate (recall), and are measured at alert volumes of 1 - 5 percent of the population.


The requirement of high precision at low alert volumes means that fraud models need specialized algorithms that are effective in their search for a needle-in-a-haystack approach. Ensemble classifiers, rule induction methods and artificial neural networks are some of the most common and successful supervised AI techniques used in predicting fraud when adequate fraud labels are available.

The dynamic, varying nature of fraud patterns and the lack of strong fraud labels require that we also use unsupervised algorithms to help uncover patterns. Novelty detection, clustering, graph anomaly detection and social network analytics are some of the commonly used AI techniques.

Labeled Data

Machine learning models are known to suffer from cold start problems. Typically, this happens when there are no cases of fraud or inadequate known cases. It usually requires a bit of creative thinking. Additionally, it needs some domain knowledge to overcome this problem and iterate on the model. Below is an example of such a situation with a customer problem that we overcame.

We used the principles of active learning, which involves working with human beings — in this case, fraud analysts — to get feedback. We were able to iterate on the machine learning model by querying analysts for fraud feedback selectively and expanding on the features to improve overall performance with every generation. Each new version was produced on a biweekly schedule.

 Unlike credit risk and marketing use cases, fraud patterns are sophisticated, dynamic and constantly changing. This means the analytic models created to fight fraud today must be equally — if not more — innovative and iterative. When appropriately designed and trained, machine learning models will continually learn and adapt to fast-moving fraud patterns. Then, businesses can make more precise, accurate decisions over time without compromising the customer experience. To learn more, visit our website or read prior articles in this blog series:

Sriram Tirunellayi

Sriram Tirunellayi

Vice President, Data and Analytics

As Vice President of Data and Analytics at Equifax, Sriram Tirunellayi leads the Global Identity and Fraud Data Sciences division. He is responsible for driving data science products and research and development, as well as advanced identity and fraud solutions for clients across various industries globally. Sriram br[...]