论文标题
空间尺度对齐网络,用于细粒度识别
Spatial-Scale Aligned Network for Fine-Grained Recognition
论文作者
论文摘要
现有的细粒视觉识别方法的重点是学习基于边际区域的表示,同时忽略了空间和规模的未对准,导致性能较低。在本文中,我们提出了空间尺度对齐网络(SSANET),并在识别过程中隐式解决了未对准的问题。特别是,SSANET由1)一个自制的提案采矿公式,具有形态对准约束; 2)一个判别刻度挖掘(DSM)模块,该模块通过循环矩阵利用特征金字塔,并为快速尺度比对提供傅立叶求解器; 3)一个定向的池(OP)模块,该模块以几个预定义的方向执行池操作。每个方向都定义了一种空间对齐,并且网络自动确定哪些是通过学习的最佳对齐。通过提出的两个模块,我们的算法可以自动确定准确的本地建议区域,并生成更强大的目标表示形式,这是各种外观方差的不变。广泛的实验验证了SSANET有能力学习更好的空间尺度不变目标表示,从而在几个基准上的细粒度识别任务上产生了出色的表现。
Existing approaches for fine-grained visual recognition focus on learning marginal region-based representations while neglecting the spatial and scale misalignments, leading to inferior performance. In this paper, we propose the spatial-scale aligned network (SSANET) and implicitly address misalignments during the recognition process. Especially, SSANET consists of 1) a self-supervised proposal mining formula with Morphological Alignment Constraints; 2) a discriminative scale mining (DSM) module, which exploits the feature pyramid via a circulant matrix, and provides the Fourier solver for fast scale alignments; 3) an oriented pooling (OP) module, that performs the pooling operation in several pre-defined orientations. Each orientation defines one kind of spatial alignment, and the network automatically determines which is the optimal alignments through learning. With the proposed two modules, our algorithm can automatically determine the accurate local proposal regions and generate more robust target representations being invariant to various appearance variances. Extensive experiments verify that SSANET is competent at learning better spatial-scale invariant target representations, yielding superior performance on the fine-grained recognition task on several benchmarks.