• 查询稿件
  • 获取最新论文
  • 知晓行业信息

基于MapReduce的时序数据离群点挖掘算法

Outlier Mining Algorithm for time series data based on MapReduce

  • 摘要: 针对海量数据中离群点的挖掘,将网格聚类和MapReduce编程模型相结合,排除不可能包含离群点的网格,再用LOF算法对剩余网格中的数据进行离群点检测。为了提高网格聚类的检测精度,本文提出了一种基于聚类半径的改进算法。实验表明了该算法的有效性,同时分析了在节点数不同的情况下,网格聚类所用时间,证明了基于MapReduce的网格聚类适合处理海量时序数据。

     

    Abstract: Aiming at outlier mining in massive time series data, the paper combined grid clustering with MapReduce programming model to exclude grids that was impossible to contain outlier, and then used LOF Algorithm to detect outliers from the rest grids. In order to improve the detection accuracy of the grid clustering, this paper proposed an improved algorithm based on clustering radius. Experimental results showed the effectiveness of the improvement. Experiment also analyzed the execution time grid cluster cost under the circumstances with different number of nodes, which proved it was suitable for handling massive time series data combined MapReduce with grid clustering.

     

/

返回文章
返回