Privacy-safe Federated Learning

Written by Niek Bouman | Nov 25, 2024 12:42:26 PM

Federated Learning is often misunderstood as a privacy technology. In this post we explain how to reap the benefits of Federated Learning for training AI models at scale, while preserving privacy.

What is Federated Learning?

Federated Learning (FL) is a distributed-computation paradigm for training machine learning (ML) models, and was first proposed by McMahan (2016). Using FL, the training of a model is performed across multiple clients, like mobile phones or edge servers, without sending the raw training data to a central coordinating server.

The model is trained locally on each device, and only the model updates, like gradient information or newly computed model parameters, are sent to the central server, or exchanged directly between the clients.

Federated Learning has a data-privacy problem

Hence, the Federated Learning paradigm is about "bringing compute to the data", as opposed to the usual approach of "bringing data to the compute" of centralized machine learning. As such, FL is primarily a bandwidth-saving technique.

As training data is not shared directly, FL is often wrongfully perceived as a technology that would protect the privacy of the training data. However, numerous research works (see references below) have shown in the recent past that, without special countermeasures, FL suffers from significant information leakage about the raw training data through the repeated sharing of the model updates during training.

References (and references therein) include: Zhu et al., 2019, Geng et al., 2021, Geiping et al., 2020, Yin et al., 2021, Dimitrov et al., 2022, Zhao et al., 2024.

Solving the privacy problem with "One Shot" Federated Learning

Roseman Labs has built an encrypted dataspace that already features privacy-preserving training of several machine learning model architectures. Given the strong customer demand for Federated Learning capabilities, we set out to develop a 'flavor' of Federated Learning that does not compromise on our product's uniform set of strong data confidentiality guarantees.

In a nutshell, we achieve this by letting clients share the parameters of their locally trained model only once ("one-shot") with the central aggregator. Moreover, we implement this aggregation step using our encrypted computing solution.

In cases where this form of federated model training is applicable, we protect the privacy of the data and achieve significant training-time speedups as well as bandwidth savings compared to centralized encrypted machine learning. With our new One-Shot Federated Learning proposition, we let you have your cake and eat it, too!

Generate new insights on sensitive data with Roseman Labs’ secure Multi-Party Computation technology. Want to find out how your organization can do that? Contact us using the form below.

View full post