通过破坏神经网络中的概括来提高增强学习的性能

论文标题

通过破坏神经网络中的概括来提高增强学习的性能

Improving Performance in Reinforcement Learning by Breaking Generalization in Neural Networks

论文作者

Ghiassian, Sina, Rafiee, Banafsheh, Lo, Yat Long, White, Adam

论文摘要

强化学习系统需要良好的表示才能正常工作。几十年来，在强化学习方面的实践成功仅限于小领域。另一方面，深度强化学习系统是可扩展的，不依赖于域的特定先验知识，并且已成功地用于播放Atari，从像素的3D导航中，并控制高度的自由机器人。不幸的是，深度加固学习系统的性能对超参数设置和体系结构的选择敏感。即使是经过良好调谐的系统，在试验和跨实验复制中也会显示出明显的不稳定性。实际上，通常需要大量的专业知识和反复试验才能实现良好的性能。该问题的一个潜在来源被称为灾难性干扰：稍后训练通过覆盖先前的学习来降低表现。有趣的是，使神经网络（NN）在批处理学习中如此有效的强大概括可能解释了将它们应用于强化学习任务时的挑战。在本文中，我们探讨了在线NN培训和干扰如何在增强学习中相互作用。我们发现，只需将输入观测值重新映射到高维空间，就可以提高学习速度和参数灵敏度。我们还显示这种预处理减少了预测任务的干扰。更重要的是，我们为NN培训提供了一种简单的方法，该方法易于实施，几乎不需要其他计算。我们证明，通过在经典控制域中进行大量实验，我们的方法可以提高预测和控制的性能。

Reinforcement learning systems require good representations to work well. For decades practical success in reinforcement learning was limited to small domains. Deep reinforcement learning systems, on the other hand, are scalable, not dependent on domain specific prior knowledge and have been successfully used to play Atari, in 3D navigation from pixels, and to control high degree of freedom robots. Unfortunately, the performance of deep reinforcement learning systems is sensitive to hyper-parameter settings and architecture choices. Even well tuned systems exhibit significant instability both within a trial and across experiment replications. In practice, significant expertise and trial and error are usually required to achieve good performance. One potential source of the problem is known as catastrophic interference: when later training decreases performance by overriding previous learning. Interestingly, the powerful generalization that makes Neural Networks (NN) so effective in batch supervised learning might explain the challenges when applying them in reinforcement learning tasks. In this paper, we explore how online NN training and interference interact in reinforcement learning. We find that simply re-mapping the input observations to a high-dimensional space improves learning speed and parameter sensitivity. We also show this preprocessing reduces interference in prediction tasks. More practically, we provide a simple approach to NN training that is easy to implement, and requires little additional computation. We demonstrate that our approach improves performance in both prediction and control with an extensive batch of experiments in classic control domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题