• 查询稿件
  • 获取最新论文
  • 知晓行业信息

微博噪声过滤和话题检测

Micro-blog noise filtering and topic detection

  • 摘要: 针对微博中充斥着的大量广告信息和其它的噪声微博,本文提出了基于C4.5决策树分类算法的用户分类过滤机制和基于特征值的计分过滤方法。利用微博文本的实时性和微博话题的时效性,还提出了一个基于时间参数的相似度计算方法。实验结果表明,该方法能提高对噪声过滤和话题检测的准确率和效率。

     

    Abstract: Aiming at the big amount of advertising messages and other noise tweets, the paper proposed a user classification filtering mechanism based on C4.5 Decision Tree Classification Algorithm and a scoring filtering method based on characteristic value. Taking advantage of the instantaneity of micro-blog text and timeliness of micro-blog topic, the paper put forward a similarity calculation method based on time parameter. Experiments showed that this mechanism could detect topics and filter noise with better accuracy and efficiency compared to the traditional approach.

     

/

返回文章
返回