FELDSPAR: Federated Learning with Model Ownership Protection and Privacy Armoring

Year: 2022

Principal researcher: Fernando Pérez González

Financing: Ministerio de Ciencia e Innovación. "Proyectos Estratégicos Orientados a la Transición Ecológica y a la Transición Digital" 2021

Period: 01/12/2022 - 30/09/2025

ID: TED2021-130624B-C21

AI is rapidly becoming one of the main driving forces of Digital Transition; however, for it to be really effective, AI requires access to large quantities of data to train increasingly complex machines. Although gathering and managing all this information can be a challenge by itself, the problem is considerably aggravated when dealing with highly privacy-sensitive information. The required raw data is typically held separately by multiple and independent institutions, so sharing it is not always possible due to the existence of strict data protection and privacy regulations, such as GDPR. Federated Learning (FL) has recently emerged as a convenient solution to address privacy concerns and regulations in this distributed data learning problem. Specifically, FL is a technology that enables collaborative machine learning (ML) by training local models on data distributed across multiple entities without exposing their private datasets. Even though this type of collaborative training avoids outsourcing the local data to a central aggregator, in recent years, advanced research has been published pointing to the feasibility of re-identification of data subjects in the FL setting.

Another risk that gets considerably magnified when resorting to a collaborative learning model has to do with copyright infringements and ownership issues. In most current practical scenarios, collaborative training is seen as a convenient device to deal with privacy, and the global model is owned/managed by the aggregator (or a third party that acquires the copyright). However, from the way training is carried out, the aggregator must share intermediate models, partially or fully, with the data owners, so that they can locally compute updates using their available data. This multiplicity of copies of the same model dramatically increases the risk of leakage or theft, with enormous economic repercussions, since training a complex model demands huge amounts of data and energy.

The main objective of FELDSPAR is the elimination or mitigation of residual privacy risks of FL in order to restore trust and legal compliance. Research on the use of privacy-enhancing technologies will be applied to the FL model updates, and will include secure multiparty computation (SMPC), homomorphic encryption (HE), Trusted Execution Environments (TEE). The right balance between them will be achieved by trading off computational complexity, communication costs and the need for a specialized architecture. Another novel core contribution of FELDSPAR will be the measurement and containment of privacy risks, which will be achieved by devising a novel privacy metric, rooted in information-theoretic grounds and fitting the needs of FL. A third major contribution of FELDSPAR will be to propose novel algorithms for ML watermarking that are suitable for FL scenarios in order to protect copyright and assess compliance. Finally, taking inspiration from traitor-tracing codes used in classical watermarking, FELDSPAR will provide mechanisms for producing copies of the global model that are fingerprinted by the aggregator in order to trace possible unlawful uses by Data Owners. To these ends, FELDSPAR aims at conducting cutting-edge research on topics like multivariate ring-based lattice cryptography, multi-key homomorphic encryption, or triggering mechanisms for black-box watermarking.