Sparta：一种划分和征服方法来解决加速器的翻译

论文标题

Sparta：一种划分和征服方法来解决加速器的翻译

SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators

论文作者

Picorel, Javier, Kohroudi, Seyed Alireza Sanaee, Yan, Zi, Bhattacharjee, Abhishek, Falsafi, Babak, Jevdjic, Djordje

论文摘要

虚拟内存（VM）对于硬件加速器的可用性和可编程性至关重要。不幸的是，实施加速器VM有效地实现了具有挑战性，因为该区域和功率限制使得使用通用CPU中使用的大型多级TLB很难。最近的研究提案主张对虚拟到物理地址映射的许多限制，以减少TLB的大小或增加其覆盖范围。但是，这种限制是没有吸引力的，因为它们放弃了传统VM的许多原始好处，例如要求分页和抄写。我们提出了Sparta，这是一种解决翻译的鸿沟和征服方法。 Sparta将地址翻译分为加速器侧和内存侧零件。加速器端翻译硬件由一个小型TLB组成，仅覆盖加速器的高速缓存层次结构（如果有），而主内存访问的转换由共享内存端TLBS执行。在内存侧执行内存访问的转换使Sparta可以与转换重叠，并避免复制TLB条目以在加速器中共享的数据。为了进一步提高内存端翻译的性能和效率，斯巴达在逻辑上分配了内存空间，将翻译委派给了小而有效的人均翻译硬件。我们对索引 - 传播加速器的评估表明，斯巴达实际上消除了翻译开销，平均将其减少了30倍（最多47倍），并提高了57％的绩效。同时，Sparta需要最小的加速器端翻译硬件，减少系统中TLB条目的总数，优雅地缩放内存大小，并保留所有关键的VM功能。

Virtual memory (VM) is critical to the usability and programmability of hardware accelerators. Unfortunately, implementing accelerator VM efficiently is challenging because the area and power constraints make it difficult to employ the large multi-level TLBs used in general-purpose CPUs. Recent research proposals advocate a number of restrictions on virtual-to-physical address mappings in order to reduce the TLB size or increase its reach. However, such restrictions are unattractive because they forgo many of the original benefits of traditional VM, such as demand paging and copy-on-write. We propose SPARTA, a divide and conquer approach to address translation. SPARTA splits the address translation into accelerator-side and memory-side parts. The accelerator-side translation hardware consists of a tiny TLB covering only the accelerator's cache hierarchy (if any), while the translation for main memory accesses is performed by shared memory-side TLBs. Performing the translation for memory accesses on the memory side allows SPARTA to overlap data fetch with translation, and avoids the replication of TLB entries for data shared among accelerators. To further improve the performance and efficiency of the memory-side translation, SPARTA logically partitions the memory space, delegating translation to small and efficient per-partition translation hardware. Our evaluation on index-traversal accelerators shows that SPARTA virtually eliminates translation overhead, reducing it by over 30x on average (up to 47x) and improving performance by 57%. At the same time, SPARTA requires minimal accelerator-side translation hardware, reduces the total number of TLB entries in the system, gracefully scales with memory size, and preserves all key VM functionalities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题