Does Federated Learning Preserve Data Privacy?

近五年来,联邦学习风靡学界,其特点之一是可以为数据隐私提供保护。但是有研究对其数据隐私保护的能力提出了疑问,并提出可以使用梯度泄露攻击来重建训练数据。然而,在Plato平台进行的用联邦学习解决图像分类和自然语言处理任务等实验证明了梯度泄露攻击的说法并不成立,联邦学习中的数据隐私实际上得到了很好的保护。
第十四期AIRS-TNSE联合杰出讲座系列活动,我们有幸邀请到李葆春教授介绍联邦学习的数据隐私保护能力,并分享他在这个领域内的相关研究成果与有趣发现。
AIRS-TNSE Joint Distinguished Seminar Series is co-sponsored by IEEE Transactions on Network Science and Engineering (TNSE) and Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS), with joint support from The Chinese University of Hong Kong, Shenzhen, Network Communication and Economics Laboratory (NCEL), and IEEE. This series aims to bring together top international experts and scholars in the field of network science and engineering to share cutting-edge scientific and technological achievements.
Join the seminar through Bilibili (http://live.bilibili.com/22587709).
-
Jianwei HuangVice President, AIRS; Presidential Chair Professor, CUHK-Shenzhen; Editor-in-Chief, IEEE TNSE; IEEE Fellow; AAIA FellowExecutive Chair
-
Baochun LiProfessor in Department of Electrical and computer Engineering at theUniversity of Toronto; Fellow of the Canadian Academy of Engineering; IEEE FellowDoes Federated Learning Preserve Data Privacy?
李葆春教授于1995年从中国清华大学计算机科学与技术系获得工学学士学位,随后于1997年和2000年分别在伊利诺伊大学厄巴纳-香槟分校的计算机科学系获得硕士和博士学位。自2000年起,他一直在多伦多大学的电子与计算机工程系任职,目前是该系的教授。自2005年8月以来,他担任贝尔加拿大计算机工程讲席教授。他目前的研究兴趣包括云计算、安全与隐私、分布式机器学习、联邦学习和网络技术。
李博士发表了共计470余篇论文,累积引用量达25000,H-index指数为88,i10-index指数为338。他曾在2000年获得IEEE通信学会Leonard G. Abraham通信系统领域奖,2009年获得IEEE通信学会多媒体通信最佳论文奖,同年还获得了多伦多大学McLean奖。他在2023年获得了IEEE INFOCOM最佳论文奖,并在2024年获得了IEEE INFOCOM成就奖。同时,他是加拿大工程院院士、加拿大工程学会院士以及国际电气与电子工程师协会会士(IEEE Fellow)。
As one of the practical paradigms that preserves data privacy when training a shared machine learning model in a decentralized fashion, federated learning has been studied extensively in the past five years. However, a substantial amount of existing work in the literature questioned its core claim of preserving data privacy, and proposed gradient leakage attacks to reconstruct raw data used for training. In the day and age of fine-tuning large language models, whether data privacy can be preserved is very important.
In this talk, I will show that despite the conventional wisdom that federated learning pose privacy leaks, data privacy, in fact, may be quite well protected. Claims in the existing literature on gradient leakage attacks are not valid in our experiments, for both image classification and natural language tasks. Our extensive array of experiments were based on Plato, an open-source framework that I developed from scratch for reproducible benchmarking comparisons in federated learning.
Video Archive