How AI Adjudication is Reducing False Positives in Watchlist Screening

Using large language models to filter out false alerts and improve screening accuracy

Threat.Digital

22 Jul 2025 • 2 min read

AI adjudication for sanction and watchlist matches

False positives are a frustration for compliance teams dealing with sanctions and watchlist screening. Traditional matching methods are intentionally cautious, flagging any name that looks remotely similar to someone on a list. While this helps catch possible risks, it also means that most alerts end up being irrelevant, draining valuable time and resources.

💡

If you want a deeper dive into why this remains such a persistent problem and what actually works to reduce it, we recently published a full article on the topic: The False Positive Race is On.

The biggest breakthrough in addressing this challenge has come from AI adjudication using large language models (LLMs). This technology is fundamentally changing how compliance teams approach screening and is already delivering meaningful improvements.

Beyond Name Similarity: Why Context Matters

LLMs do much more than compare how two names look or sound. These models can consider the context surrounding each name, details like roles, locations, biographies, and other clues that traditional systems simply ignore. This ability to interpret context allows AI adjudication to function as a highly effective second review layer. Instead of sending every "close enough" match to compliance analysts, the system can filter out the noise and surface only the alerts that really matter.

How AI Adjudication Measures Up

To determine the potential impact, we ran a benchmark test comparing two methods:

Traditional fuzzy matching only
Fuzzy matching followed by LLM review (AI adjudication) with our DiligenAI system

Here’s how the two methods performed:

Method	False Positive Reduction	Precision	Recall	F1 Score
Fuzzy Matching Only	Baseline	~7%	100%	~13%
Fuzzy + LLM Review	Over 94% reduction	~63%	100%	~77%

Both approaches have very high recall (i.e. identifying true matches), but the difference in efficiency was clear. AI adjudication delivered much higher precision, boosting the F1 score (a key measure of overall effectiveness) by nearly six times. The LLM-enhanced system reduced false positives by over 94 percent, while still finding the legitimate alerts.

Number of False Positives by Screening Method (per 1000 matches)

For compliance professionals, this shift means fewer unnecessary alerts, less manual work, and more time focused on genuine risk. The days of wading through endless irrelevant matches can finally become a thing of the past.

There is a Tradeoff

To reach this level of accuracy, advanced AI models are currently required. These frontier models come with a higher cost per screening, and hosting them on-premise can be challenging. However, the savings in analyst time and the reduction in operational drag often outweigh the expense, making AI adjudication a powerful tool for organizations focused on efficiency and accuracy.

It’s also important to note that these models are improving rapidly. Costs are coming down, and open-source solutions are advancing quickly, so hosting models in-house is likely to become a realistic option in the near future.

Future

We believe that within the next two to five years, this combination of traditional fuzzy matching and large language model-driven contextual review will become the industry standard for name screening. The results are clear: far fewer false positives, higher precision, and a more efficient use of compliance resources.

For compliance teams, this means less time spent clearing irrelevant alerts and more focus on risks. The technology is already here, and its advantages are only becoming clearer as it continues to evolve.