论文标题
使用序列二进制决策图从短语中提取模板
Extraction of Templates from Phrases Using Sequence Binary Decision Diagrams
论文作者
论文摘要
从一组相关短语中提取诸如``将x视为y''之类的模板提取需要识别其内部结构。本文通过使用序列二进制决策图(SEQBDD)的新型松弛变体,提出了一种无监督的方法,用于仅从标记的文本中提取模板。 SEQBDD可以将一组序列压缩到等效于最小DFA的图形结构中,但更紧凑,更适合模板提取的任务。本文的主要贡献是SEQBDD构造算法的轻松形式,使其能够从少量数据中形成一般表示。在放松的SEQBDD结构期间,文本中共享结构的压缩过程自然会诱导我们希望提取的模板。实验表明,该方法能够基于Corpora和Social Media简短消息的Percora和短语模板的动词+介词模板对任务进行高质量提取。
The extraction of templates such as ``regard X as Y'' from a set of related phrases requires the identification of their internal structures. This paper presents an unsupervised approach for extracting templates on-the-fly from only tagged text by using a novel relaxed variant of the Sequence Binary Decision Diagram (SeqBDD). A SeqBDD can compress a set of sequences into a graphical structure equivalent to a minimal DFA, but more compact and better suited to the task of template extraction. The main contribution of this paper is a relaxed form of the SeqBDD construction algorithm that enables it to form general representations from a small amount of data. The process of compression of shared structures in the text during Relaxed SeqBDD construction, naturally induces the templates we wish to extract. Experiments show that the method is capable of high-quality extraction on tasks based on verb+preposition templates from corpora and phrasal templates from short messages from social media.