Alexandrea Ridden and Toon Segers

Alex is CEO of graph analytics company Knights Analytics, Toon is CPO at Roseman Labs

Published on: 23 April 2021

Privacy Preserving Entity Resolution and Graph Building

Privacy preserving entity resolution across two companies

In this post we present a new solution to link data and entities across privacy boundaries. The system uses state of the art Machine Learning techniques to match entities, and Secure Multi-Party Computation to ensure superior privacy. This solution is the result of a collaboration between two companies: Knights Analytics and Roseman Labs.

A partnership between Knights Analytics and Roseman Labs enables companies to deploy state of the art entity resolution at scale (billions of records) across jurisdictions, companies and completely different data sources. A modern privacy technology called secure multi-party computation ensures that all computations are performed on the source data, “bringing the algorithm to the data”, such that data does not leave the country.

About Roseman Labs

Roseman Labs is the first Dutch company that delivers privacy-preserving analytics based on Secure Multi-Party Computation (MPC). Roseman Labs offers its Virtual Data Lake technology, which enables parties to perform analyses across their separate data silos without the need to share or expose the underlying data. The unprecedented privacy guarantees significantly accelerate the process of setting up data-driven collaborations.

About Knights Analytics

Knights Analytics uses its cutting-edge network building algorithms and extremely scalable architecture to build enterprise knowledge graphs to help companies gain insights and power their analytics. Knights Analytics provides a proprietary, next-generation entity resolution engine, designed to be easy to deliver and maintained. 

Identity Matching Using Network Analytics

Knight’s entity resolution engine can knit together data from multiple internal and external data sources finding connections which a normal account or customer ID may not necessarily capture.

Once connections across siloed data sources are found, the engine can assemble a unified view of the entity, with the goal that one real-world entity is represented only once in the data.

The engine combines innovative network building algorithms with novel and best in class entity detection and resolution algorithms, and is deployed with cloud ready, scalable graph DB and cluster management technology.

Multi-Party Computation

MPC is a cryptographic technique that enables multiple parties to perform computations on their joint data set, such that each party learns nothing beyond its own input and, optionally, the output of these computations.

The data input by one party remains hidden for the other parties. The solid theoretical foundations of MPC date back to the 80s (Wiki). Recently, MPC has become mature enough to be of practical value due to advances in computer and networking technologies and in the theoretical foundations of MPC.

A Unique Solution

To enable entity resolution we deploy Knights Analytics' network and entity resolution engine at scale on each side of the privacy boundary. In the diagram below we assume there could be two customers trying to benefit from sharing
data without the data being exposed to one another.

Each customer builds an independent network of their data. This will unify disparate data sources, using state of the art graph database and cluster management technology.

The MPC technology and protocols then query across both graphs and ensure that records never leave the privacy boundary, and that only entities or patterns which pass a given threshold are output. Throughout the process, the algorithm is applied within each privacy boundary on encrypted data, ensuring end-to-end privacy.

Use Cases

Anti-Money Laundering

Banks are required to detect money laundering activities of their clients. This often involves trying to understand money flows across borders, but privacy and bank secrecy laws prevent banks from sharing data between many countries and it’s extremely difficult for companies to collaborate to detect money laundering. The fines are vast and the investigations time consuming.

Part of the challenge in following money flows is to match accounts belonging to the same companies or individuals. Matching identities is often referred to as entity resolution. Knights Analytics can build networks of transactions at scale in each jurisdiction, matching and combining entities and accounts to unify disparate data sources.

Alerts of suspicious entities from each jurisdiction are then encrypted and resolved across borders using a second entity resolution algorithm, applied to this encrypted data, allowing investigators to follow suspicious money flows outside of their own bank/country.

Our MPC technology and protocols ensure that bank records never leave the country, and that only the results (e.g., alerts with a cross border risk score) are fed back to each jurisdiction. Throughout the process, the algorithm is applied within each jurisdiction on encrypted data, ensuring end-to-end privacy.

Clinical Trials Early Evaluation

It is not easy to measure progress during a clinical trial. MPC and Graph Technology make it possible to get an aggregated view of the trial data without unmasking any patient data.

With the median cost of a clinical trial phases ranging from $13 million to more than $62 million, and many trials requiring multiple phases. Stopping trials early due to poor efficacy, ineffectiveness or poor results could save drug-developers millions of dollars.

Companies sometimes partially unmask data at a checkpoint during a trial to gain insight, enabling them to stop the trial early. In doing this the patients’ data is then discarded from the final read-out. Our innovative solution enables the continuous pooling of results whilst preserving patient privacy throughout a trial.

Graph Technology allows the combining of data from multiple hospitals/treatment centres and MPC allows privacy-preserving computations on that patient information. Alert signals can be generated “in the blind”, and data will then only be shared to the decision makers if pre-defined thresholds or rules are triggered.

To conclude

By combining forces and applying disruptive technologies to the data sharing challenge, Knights Analytics and Roseman Labs expose an unprecedented ability to collaborate. The benefit from the sharing of data is known but rarely realised due to privacy restrictions. By combining state of the art entity resolution and graph building techniques with fast, secure privacy preserving analytics this unlocks a new reality.

We thank Alex Ridden of Knights Analytics for co-authoring this post! (Website, Email)

(Photo by Omar Flores on Unsplash)