嵌入式系统的资源有效神经网络

论文标题

嵌入式系统的资源有效神经网络

Resource-Efficient Neural Networks for Embedded Systems

论文作者

Roth, Wolfgang, Schindler, Günther, Klein, Bernhard, Peharz, Robert, Tschiatschek, Sebastian, Fröning, Holger, Pernkopf, Franz, Ghahramani, Zoubin

论文摘要

传统上，机器学习是一项资源密集的任务，嵌入式系统，自动导航以及物联网的愿景激发了人们对资源有效方法的兴趣。这些方法的目的是在计算和能源方面进行绩效和资源消耗之间精心选择的权衡。这种方法的开发是当前机器学习研究的主要挑战，以及确保机器学习技术从科学环境中平稳过渡的关键，几乎无限的计算资源将资源用于日常应用。在本文中，我们概述了机器学习技术的当前状态，促进了这些现实世界的要求。特别是，我们专注于基于深度神经网络（DNN）的资源有效推理，这是过去十年的主要机器学习模型。我们全面概述了广泛的文献，这些文献主要可以分为三个非缺乏的类别：（i）量化神经网络，（ii）网络修剪和（iii）结构效率。这些技术可以在训练期间或作为后处理过程中应用，并且它们被广泛用于减少记忆足迹，推理速度和能源效率方面的计算需求。我们还简要讨论了用于DNN的嵌入式硬件的不同概念及其与机器学习技术的兼容性以及减少能量和潜伏期的潜力。我们使用压缩技术（量化，修剪）对一组资源受限的嵌入式系统（例如CPU，GPU和FPGAS）进行了对众所周知的基准数据集的实验来证实我们的讨论。获得的结果突显了在资源效率和预测质量之间找到良好权衡的困难。

While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into everyday's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. In particular, we focus on resource-efficient inference based on deep neural networks (DNNs), the predominant machine learning models of the past decade. We give a comprehensive overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. We also briefly discuss different concepts of embedded hardware for DNNs and their compatibility with machine learning techniques as well as potential for energy and latency reduction. We substantiate our discussion with experiments on well-known benchmark data sets using compression techniques (quantization, pruning) for a set of resource-constrained embedded systems, such as CPUs, GPUs and FPGAs. The obtained results highlight the difficulty of finding good trade-offs between resource efficiency and prediction quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题