• 查询稿件
  • 获取最新论文
  • 知晓行业信息
官方微信 欢迎关注

微博噪声过滤和话题检测

奚浩瀚, 刘云, 熊菲

奚浩瀚, 刘云, 熊菲. 微博噪声过滤和话题检测[J]. 铁路计算机应用, 2015, 24(3): 19-22.
引用本文: 奚浩瀚, 刘云, 熊菲. 微博噪声过滤和话题检测[J]. 铁路计算机应用, 2015, 24(3): 19-22.
XI Haohan, LIU Yun, XIONG Fei. Micro-blog noise filtering and topic detection[J]. Railway Computer Application, 2015, 24(3): 19-22.
Citation: XI Haohan, LIU Yun, XIONG Fei. Micro-blog noise filtering and topic detection[J]. Railway Computer Application, 2015, 24(3): 19-22.

微博噪声过滤和话题检测

基金项目: 国家自然基金(61172072); 中央高校基本科研业务费(2014-JBM018)
详细信息
    作者简介:

    奚浩瀚,在读硕士研究生;刘 云,教授。

  • 中图分类号: U285:TP39

Micro-blog noise filtering and topic detection

  • 摘要: 针对微博中充斥着的大量广告信息和其它的噪声微博,本文提出了基于C4.5决策树分类算法的用户分类过滤机制和基于特征值的计分过滤方法。利用微博文本的实时性和微博话题的时效性,还提出了一个基于时间参数的相似度计算方法。实验结果表明,该方法能提高对噪声过滤和话题检测的准确率和效率。
    Abstract: Aiming at the big amount of advertising messages and other noise tweets, the paper proposed a user classification filtering mechanism based on C4.5 Decision Tree Classification Algorithm and a scoring filtering method based on characteristic value. Taking advantage of the instantaneity of micro-blog text and timeliness of micro-blog topic, the paper put forward a similarity calculation method based on time parameter. Experiments showed that this mechanism could detect topics and filter noise with better accuracy and efficiency compared to the traditional approach.
  • [1] 郑斐然,苗夺谦,张志飞,高 灿. 一种中文微博新闻话题 检测的方法[J].计算机科学,2012,39(1).
    [2] Shota Ishikawa, Yutaka Arakawa, Shigeaki Tagashira, Akira Fuku- da. Hot Topic Detection in Local Areas Using Twitter and Wiki- pedia [J]. ARCS Workshops (ARCS), 28-29 Feb. 2012.
    [3] 邱 洋. 微博数据提取及话题检测方法研究[D].大连:大连 理工大学,2013.
    [4] Yukino Ikegami, Kenta Kawai, Yoshimi Namihira, Setsuo Tsuru- ta. Topic and Opinion Classification based Information Credibi- lity Analysis on Twitter[C]. 2013 IEEE International Conference on Systems, Man, and Cybernetics, 13-16 Oct. 2013.
    [5] 陆 旭.文本挖掘中若干关键问题研究[M]. 合肥 : 中国科学 技术大学出版社,2008.
    [6] Hao Tu, Jin Ding. An Efficient Clustering Algorithm for Microb- logging Hot Topic Detec-tion. Computer Science & Service Sys- tem (CSSS)[C]. 2012 International Conference on Computer Science and Service System, 11-13 Aug. 2012.
    [7] 刘 涛. 用于文本分类和文本聚类的特征选择和特征抽取方 法的研究[D].天津:南开大学,2004.
    [8] Jing Xie, Gongshen Liu, Wei Ning. A Topic Detection Method for Chinese Microblog[C]. 2012 Fourth International Symposium on Information Science and Engineering, 14-16 Dec. 2012.
    [9] 周 刚,部鸿程,熊小兵,等.MB-SinglePass:基于组合相似 度的微博话题检测[J].计算机科学,2012,39(10):198- 202.
    [10] Feifei Peng, Xu Qian, Hui Meng, Dan Zhou. Research on Algori- thm of Extracting Micro-blog’s Hot Topics. Electronics[C]. Communications and Control (ICECC), 2011 International Con- ference on Communications and Control, 9-11 Sept. 2011.
    [11] 程显毅,朱 倩.文本挖掘原理[M]. 北京:科学出版社, 2010.
    [12] Xiangying Dai, Qingcai Chen, Xiaolong Wang, Jun xu. Online Topic Detection and Track-ing of Financial News based on Hierar- chical Clustering[C]. Proceedings of the Ninth Interna-tional Con- ference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010.
  • 期刊类型引用(2)

    1. 李湘东,阮涛,潘练. 融合去噪技术和动态主题数的新闻话题分析框架研究. 情报科学. 2018(04): 14-21 . 百度学术
    2. 程秀峰,张心怡,王宁. 基于CART决策树的网络问答社区新兴话题识别研究. 数据分析与知识发现. 2018(12): 52-59 . 百度学术

    其他类型引用(4)

计量
  • 文章访问数:  91
  • HTML全文浏览量:  0
  • PDF下载量:  116
  • 被引次数: 6
出版历程
  • 收稿日期:  2014-09-24
  • 刊出日期:  2015-03-24

目录

    /

    返回文章
    返回