论文标题
学习要学习视频对象细分的内容
Learning What to Learn for Video Object Segmentation
论文作者
论文摘要
视频对象细分(VOS)是一个高度挑战的问题,因为仅在给定的第一框参考掩码推理期间定义目标对象。如何捕获和利用此有限的目标信息的问题仍然是一个基本的研究问题。我们通过引入端到端可训练的VOS体系结构来解决此问题,该体系结构集成了一个可不同的少数学习模块。该内部学习者旨在通过最小化第一帧的分割误差来预测目标的强大参数模型。我们通过学习几个射门学习者应该学习的内容,超越了标准的几次学习技术。这使我们能够在当前框架中实现目标的丰富内部表示,从而大大提高了方法的细分精度。我们对多个基准进行了广泛的实验。我们的方法通过达到81.5的总分,在大规模YouTube-VOS 2018数据集上设定了新的最新技术,比以前的最佳结果相对相对2.6%。
Video object segmentation (VOS) is a highly challenging problem, since the target object is only defined during inference with a given first-frame reference mask. The problem of how to capture and utilize this limited target information remains a fundamental research question. We address this by introducing an end-to-end trainable VOS architecture that integrates a differentiable few-shot learning module. This internal learner is designed to predict a powerful parametric model of the target by minimizing a segmentation error in the first frame. We further go beyond standard few-shot learning techniques by learning what the few-shot learner should learn. This allows us to achieve a rich internal representation of the target in the current frame, significantly increasing the segmentation accuracy of our approach. We perform extensive experiments on multiple benchmarks. Our approach sets a new state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an overall score of 81.5, corresponding to a 2.6% relative improvement over the previous best result.