人类对象互动检测的视觉语义图形注意力网络

论文标题

人类对象互动检测的视觉语义图形注意力网络

Visual-Semantic Graph Attention Networks for Human-Object Interaction Detection

论文作者

Liang, Zhijun, Rojas, Juan, Liu, Junfa, Guan, Yisheng

论文摘要

在场景理解中，机器人技术不仅受益于检测各个场景实例，还可以从学习可能的互动中受益。人对象相互作用（HOI）检测到对<人类，谓词，对象>三重态的作用。在推断相互作用时发现了上下文信息至关重要。但是，大多数作品仅使用单个人类对象对中的本地功能进行推断。很少有工作研究通过图形网络提供的子公司关系的歧义贡献。同样，很少有人学会了有效利用视觉提示以及HOI中包含的内在语义规律。我们贡献了一个双向注意网络，该网络有效地从主要的人类对象关系以及通过注意机制来有效地汇总了上下文视觉，空间和语义信息，以实现强大的歧义力量。我们在两个基准上获得了可比的结果：V-Coco和Hico-Det。代码可在\ url {https://github.com/birlrobotics/vs-gats}中获得。

In scene understanding, robotics benefit from not only detecting individual scene instances but also from learning their possible interactions. Human-Object Interaction (HOI) Detection infers the action predicate on a <human, predicate, object> triplet. Contextual information has been found critical in inferring interactions. However, most works only use local features from single human-object pair for inference. Few works have studied the disambiguating contribution of subsidiary relations made available via graph networks. Similarly, few have learned to effectively leverage visual cues along with the intrinsic semantic regularities contained in HOIs. We contribute a dual-graph attention network that effectively aggregates contextual visual, spatial, and semantic information dynamically from primary human-object relations as well as subsidiary relations through attention mechanisms for strong disambiguating power. We achieve comparable results on two benchmarks: V-COCO and HICO-DET. Code is available at \url{https://github.com/birlrobotics/vs-gats}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题