MARL=PPAD

马尔科夫博弈构成了多智能体强化学习和智能体顺序交互的研究基础,其最优解被称为完美均衡(Markov perfect equilibrium,MPE)。已有研究表明,在无界马尔科夫博弈中求解 MPE 至少是 PPAD 困难的。因此,邓小铁教授及其合作者提出了近似马尔科夫完美均衡作为无界多人广义和马尔科夫博弈计算问题的解概念,并证明了该解概念既保留了马尔科夫完美性质又具有 PPAD-Complete 复杂度。该解的概念为多智能体学习算法由静态双人博弈成功扩展到动态多人马尔科夫博弈奠定了计算复杂度理论基础,为分布式人工智能、多智能体系统研究开辟了新的路径与思路。
第八期 AIRS-TNSE 联合杰出讲座系列活动,我们有幸邀请到北京大学的邓小铁教授介绍马尔科夫博弈的近似完美均衡,并分享他在这个领域内的相关研究成果与有趣发现。
AIRS-TNSE Joint Distinguished Seminar Series is co-sponsored by IEEE Transactions on Network Science and Engineering (TNSE) and Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS), with joint support from The Chinese University of Hong Kong, Shenzhen, Network Communication and Economics Laboratory (NCEL), and IEEE. This series aims to bring together top international experts and scholars in the field of network science and engineering to share cutting-edge scientific and technological achievements.
Join the seminar on April 28 through Bilibili (http://live.bilibili.com/22587709).
-
Jianwei HuangVice President, AIRS; Presidential Chair Professor, CUHK-Shenzhen; Editor-in-Chief, IEEE TNSE; IEEE Fellow; AAIA FellowExecutive Chair
-
邓小铁Chair professor, Peking University; Council member of Game Theory Society; Member of Academia Europaea; ACM, IEEE, CSIAM FellowMARL=PPAD
邓小铁教授,于1982年在清华大学获得学士学位,于1984年在中国科学院获得硕士学位,于1989年在斯坦福大学获得博士学位。2017年12月他入职北京大学,任计算机学院前沿计算研究中心讲席教授。他曾任教于上海交通大学、利物浦大学、香港城市大学和约克大学。在此之前,他还是西蒙菲莎大学的 NSERC 国际研究员。邓小铁教授的主要科研方向为算法博弈论、区块链、互联网经济、在线算法及并行计算。2008年,他因在算法博弈论领域的贡献当选 ACM Fellow;2019年,因在不完全信息计算和交互环境计算领域的贡献当选 IEEE fellow;2020年当选欧洲科学院外籍院士;2021年当选中国工业与应用数学学会会士(CSIAM Fellow);2021年被任命为博弈论学会(GTS)理事;2021年被聘为中国运筹学会博弈论分会荣誉理事;2021年获得 CCF 人工智能学会多智能体与多智能体系统研究成就奖;2022年获得 ACM 计算经济学的“时间检验奖”(Test of Time Award)。
Similar to the role of Markov decision processes in reinforcement learning, Markov games (also known as stochastic games) form the basis for the study of multi-agent reinforcement learning and sequence-agent interaction. We introduce an approximate Markov perfect equilibrium as a computational problem for solving finite-state stochastic games under infinite time discounting, and prove its PPAD completeness. This solution concept preserves the Markovian-perfect property, opening the possibility to extend successful multi-agent reinforcement learning algorithms to multi-agent dynamic games, thus extending the range of PPAD complete classes.
Video Archive