Trusted Staging: Link, analyze, and train AI without access to raw data

Written by Freya de Mink | Feb 21, 2025 1:16:59 PM

Health-RI presented a blueprint for 'Trusted Staging' last February 13, as an umbrella term for activities that take place without data being visible to researchers. Trusted Staging goes beyond linking and pre-processing, because even complete analyses are possible without access to data.

This makes the concept a logical starting point for new techniques for data security and privacy, such as those from Roseman Labs.

Shielded and decentralized where possible, flexible and detailed where necessary

Trusted Staging includes six activities: Linking, Privacy & Compliance, Federated Analysis, Exploration, Synthetic Data, and AI. Roseman Labs supports all these activities in a special way: data remains fully encrypted and decentralized.

Secure linking – High-quality linking often requires a combination of fields, including (very) sensitive personal data. Roseman Labs offers the safest and most flexible solution for this. Linking on inexact fields (e.g. with spelling mistakes) is also possible.
Privacy & Compliance – Data Privacy Impact Assessments for the use of Roseman Labs are efficient and transparent, because the technique enforces purpose limitation and risk minimization. Data holders remain in full control, only results are revealed.
Explorer Feature – Questions about cohort size, overlap, data quality, and available fields are ideally suited for MPC because high-level answers will suffice. Roseman Labs supports data exploration on very large data volumes with a unique approach, read more about it here.
Federated analysis – Roseman Labs combines the power of decentralized computing with privacy guarantees, confidentiality for partial results and the possibility of vertical linking. Read more here and here.
Synthetic data – Datasets that resemble the source data, but do not contain actual personal data, are an important tool for exploring and preparing a (federated) analysis. In the Roseman Labs platform, you use dummy data (a simple form of synthetic data) to create an example analysis, which you use to request approval from the data holders.
AI and Machine Learning – Training and applying prediction models to fully encrypted data from multiple sources is what Roseman Labs MPC is all about. You can learn more about it in this blog and this video.

MPC should be the standard for data exploration, because access to details is almost never necessary - and therefore not proportional.

When will researchers get data access?

After processing the Trusted Staging environment, results become available in a Secure Processing Environment (SPE), such as myDRE from anDREa B.V. (Digital Research Environment). This is an environment where researchers can see results and details to the extent necessary and proportionate. Copying and exporting from this environment is not possible.

The combination of Trusted Staging and Secure Processing Environment forms a two-stage rocket that encourages the use of the strongest available security techniques. If an analysis does require access to details, this is always possible after explicit approval, even at a later stage of research.

Why is this a positive development?

In many designs for secondary data use, privacy protection is elaborated in a step of pseudonymization, aggregation and/or selection that makes linking and later adding details complex in many cases. Data sets are then brought to secure processing environments where analysts have free access.

Adding Trusted Staging explicitly provides room for further protected processing, and is more in line with the possibilities of new privacy-protecting techniques.

The Dutch version of this article can be found here.

Generate new insights on sensitive data with Roseman Labs’ secure Multi-Party Computation technology. Want to find out how your organization can do that? Contact us using the form below.

View full post