论文标题
使用两种不同策略修改自标签的可解释的半监督分类器
An interpretable semi-supervised classifier using two different strategies for amended self-labeling
论文作者
论文摘要
在某些机器学习应用程序的背景下,获得数据实例是一个相对容易的过程,但是标记它们可能会变得非常昂贵或乏味。这样的方案导致数据集的标记实例和大量未标记的实例。半监督分类技术在学习阶段结合了标记和未标记的数据,以提高分类器的概括能力。遗憾的是,大多数成功的半监督分类器不允许解释其结果,因此表现得像黑匣子一样。但是,越来越多的问题域,专家要求对决策过程有清晰的了解。在本文中,我们报告了一项扩展的实验研究,该研究提出了一项可解释的自标记的灰色盒分类器,该研究使用黑匣子来估计缺失的类标签和一个白盒来解释最终预测。探索了修改自我标记过程的两种不同的方法:第一种基于黑匣子的信心和后者基于粗糙集理论的措施。扩展实验研究的结果通过我们的分类器的透明度和简单性来支持可解释性,同时与文献中报道的最先进的自我标记的分类器相比,在获得优越的预测率。
In the context of some machine learning applications, obtaining data instances is a relatively easy process but labeling them could become quite expensive or tedious. Such scenarios lead to datasets with few labeled instances and a larger number of unlabeled ones. Semi-supervised classification techniques combine labeled and unlabeled data during the learning phase in order to increase the classifier's generalization capability. Regrettably, most successful semi-supervised classifiers do not allow explaining their outcome, thus behaving like black boxes. However, there is an increasing number of problem domains in which experts demand a clear understanding of the decision process. In this paper, we report on an extended experimental study presenting an interpretable self-labeling grey-box classifier that uses a black box to estimate the missing class labels and a white box to explain the final predictions. Two different approaches for amending the self-labeling process are explored: a first one based on the confidence of the black box and the latter one based on measures from Rough Set Theory. The results of the extended experimental study support the interpretability by means of transparency and simplicity of our classifier, while attaining superior prediction rates when compared with state-of-the-art self-labeling classifiers reported in the literature.