论文标题

标签偏移估计的统一视图

A Unified View of Label Shift Estimation

论文作者

Garg, Saurabh, Wu, Yifan, Balakrishnan, Sivaraman, Lipton, Zachary C.

论文摘要

在标签偏移下,标签分布p(y)可能会改变,但类条件分布p(x | y)没有。有两种估计标签边缘的主要方法。 BBSE是一种基于混乱矩阵的矩匹配方法,证明是一致的,并提供了可解释的误差范围。但是,我们称为MLL的最大似然估计方法在经验上主导。在本文中,我们介绍了这两种方法和MLL的第一个理论表征的统一观点。我们的贡献包括(i)MLL的一致性条件,其中包括分类器的校准以及BBSE也需要的混淆矩阵的可逆性条件; (ii)一个统一的框架,将BBSE大致相当于MLL的特定选择; (iii)将MLLS的有限样本误差分解为反映误解和估计误差的术语。我们的分析将BBSE的统计低效率归因于由于校准粗糙而导致的信息丢失。合成数据,MNIST和CIFAR10的实验支持我们的发现。

Under label shift, the label distribution p(y) might change but the class-conditional distributions p(x|y) do not. There are two dominant approaches for estimating the label marginal. BBSE, a moment-matching approach based on confusion matrices, is provably consistent and provides interpretable error bounds. However, a maximum likelihood estimation approach, which we call MLLS, dominates empirically. In this paper, we present a unified view of the two methods and the first theoretical characterization of MLLS. Our contributions include (i) consistency conditions for MLLS, which include calibration of the classifier and a confusion matrix invertibility condition that BBSE also requires; (ii) a unified framework, casting BBSE as roughly equivalent to MLLS for a particular choice of calibration method; and (iii) a decomposition of MLLS's finite-sample error into terms reflecting miscalibration and estimation error. Our analysis attributes BBSE's statistical inefficiency to a loss of information due to coarse calibration. Experiments on synthetic data, MNIST, and CIFAR10 support our findings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源