用于扩展多机构增强学习的进化人群课程

论文标题

用于扩展多机构增强学习的进化人群课程

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

论文作者

Long, Qian, Zhou, Zihan, Gupta, Abhibav, Fang, Fei, Wu, Yi, Wang, Xiaolong

论文摘要

在多代理游戏中，随着代理商数量的增加，环境的复杂性可以成倍增长，因此在代理人人数庞大时学习良好的政策尤其具有挑战性。在本文中，我们引入了进化人群课程（EPC），这是一种课程学习范式，通过逐步增加培训剂的种群来扩展多机构增强学习（MARL）。此外，EPC使用一种进化方法来解决整个课程中的客观未对准问题：在早期阶段成功训练的代理人不一定是适应以后阶段的最佳候选人。具体而言，EPC在每个阶段都保持多种代理，对这些集合进行混合和匹配和微调，并促进具有最佳适应性的代理集。我们在流行的MARL算法上实施EPC，并从经验上表明，随着代理商数量成倍增长，我们的方法始终优于基本线。

In multi-agent games, the complexity of the environment can grow exponentially as the number of agents increases, so it is particularly challenging to learn good policies when the agent population is large. In this paper, we introduce Evolutionary Population Curriculum (EPC), a curriculum learning paradigm that scales up Multi-Agent Reinforcement Learning (MARL) by progressively increasing the population of training agents in a stage-wise manner. Furthermore, EPC uses an evolutionary approach to fix an objective misalignment issue throughout the curriculum: agents successfully trained in an early stage with a small population are not necessarily the best candidates for adapting to later stages with scaled populations. Concretely, EPC maintains multiple sets of agents in each stage, performs mix-and-match and fine-tuning over these sets and promotes the sets of agents with the best adaptability to the next stage. We implement EPC on a popular MARL algorithm, MADDPG, and empirically show that our approach consistently outperforms baselines by a large margin as the number of agents grows exponentially.

下载PDF全文

下载文献需遵守相关版权规定

论文标题