论文标题

通过双重混合物自兴奋的过程来描述和预测在线项目

Describing and Predicting Online Items with Reshare Cascades via Dual Mixture Self-exciting Processes

论文作者

Kong, Quyu, Rizoiu, Marian-Andrei, Xie, Lexing

论文摘要

众所周知,在线行为是长期尾巴的,大多数级联的动作都是短暂的,少数人很长。在线事件的生成模型中的一个突出缺点是无法很好地描述不受欢迎的项目。这项工作通过提出双重混合物自我激发过程来共同向级联生学习,解决了这些缺点。我们首先是从观察到的,即在霍克斯过程中可以分离内容病毒性和影响力衰减的最大似然性估计值。接下来,我们提出的模型利用了Borel混合物模型和核混合模型,共同建模了一组异质的级联反应。当应用于相同的在线项目的级联时,该模型直接表征了它们的传播动态和供应可解释的数量,例如内容病毒性和内容影响影响衰减,以及预测最终内容流行的方法。在两个转发级联数据集中,一个与YouTube视频有关,第二个与有争议的新闻文章有关 - 我们表明我们的模型捕获了项目,出版商和类别的粒度上的在线项目之间的差异。特别是,我们能够根据它们如何通过社交媒体扩散,将F1得分为0.945,从而区分极右翼,阴谋,有争议和知名的在线新闻文章。在Holdout数据集上,我们表明,双重混合模型为重新分散扩散级联提供了,尤其是不受欢迎的级联,更好的概括性能以及在线项目中,是准确的项目受欢迎程度预测。

It is well-known that online behavior is long-tailed, with most cascaded actions being short and a few being very long. A prominent drawback in generative models for online events is the inability to describe unpopular items well. This work addresses these shortcomings by proposing dual mixture self-exciting processes to jointly learn from groups of cascades. We first start from the observation that maximum likelihood estimates for content virality and influence decay are separable in a Hawkes process. Next, our proposed model, which leverages a Borel mixture model and a kernel mixture model, jointly models the unfolding of a heterogeneous set of cascades. When applied to cascades of the same online items, the model directly characterizes their spread dynamics and supplies interpretable quantities, such as content virality and content influence decay, as well as methods for predicting the final content popularities. On two retweet cascade datasets -- one relating to YouTube videos and the second relating to controversial news articles -- we show that our models capture the differences between online items at the granularity of items, publishers and categories. In particular, we are able to distinguish between far-right, conspiracy, controversial and reputable online news articles based on how they diffuse through social media, achieving an F1 score of 0.945. On holdout datasets, we show that the dual mixture model provides, for reshare diffusion cascades especially unpopular ones, better generalization performance and, for online items, accurate item popularity predictions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源