论文标题
PYODDS:具有自动化机器学习的端到端异常检测系统
PyODDS: An End-to-end Outlier Detection System with Automated Machine Learning
论文作者
论文摘要
对于各种数据挖掘应用程序,离群值检测是一项重要任务。当前的离群检测技术通常是针对特定域的手动设计的,需要大量的人类努力,即数据库设置,算法选择和超参数调整。为了填补此空白,我们提出了PYODDS,这是一种使用数据库支持的自动端到端Python系统,用于离群值检测,该系统可以自动优化手头新数据源的离群检测管道。具体来说,我们定义了离群检测管道中的搜索空间,并在给定的搜索空间内产生搜索策略。 PYODDS启用基于Apache Spark后端服务器和轻量重量数据库的端到端执行。它还为具有或没有数据科学或机器学习背景的用户提供统一的接口和可视化。特别是,我们在几个现实世界数据集上演示了pyodds,并具有量化分析和可视化结果。
Outlier detection is an important task for various data mining applications. Current outlier detection techniques are often manually designed for specific domains, requiring large human efforts of database setup, algorithm selection, and hyper-parameter tuning. To fill this gap, we present PyODDS, an automated end-to-end Python system for Outlier Detection with Database Support, which automatically optimizes an outlier detection pipeline for a new data source at hand. Specifically, we define the search space in the outlier detection pipeline, and produce a search strategy within the given search space. PyODDS enables end-to-end executions based on an Apache Spark backend server and a light-weight database. It also provides unified interfaces and visualizations for users with or without data science or machine learning background. In particular, we demonstrate PyODDS on several real-world datasets, with quantification analysis and visualization results.