• 查询稿件
  • 获取最新论文
  • 知晓行业信息

基于聚类分析算法的铁路通信设备厂商信息智能分类

Intelligent classification of railway communication equipment manufactures information based on cluster analysis

  • 摘要: 针对铁路通信大数据平台中设备厂商信息不规范的问题,提出一种采用聚类分析算法对厂商信息智能分类的方法,介绍聚类分析算法、相似度计算方法和聚类性能度量等理论,通过分词处理、构建词袋模型、权值转换等数据预处理技术,将文本转换为适合分类的权值向量,采用K-均值聚类、层次聚类算法分别对部分样本进行聚类分析,比较测试结果,最终选择层次聚类算法对所有样本进行聚类分析。该算法可以将不规范的厂商信息进行合理的分类,从而为形成厂商信息字典提供数据支持。

     

    Abstract: In order to solve the nonstandard problems of equipment manufacture information in railway communication big data platform, this paper put forward a method of intelligent classification based on cluster analysis. The paper introduced the theories of cluster analysis, similarity calculation method, and clustering performance measurement. By using the data preprocessing technologies such as word cuts, bag of words model, weight conversion, text can be converted to weight vector. The K-Mean clustering and hierarchical clustering algorithm were used to analyze some of the samples, compare the result, the hierarchical clustering algorithm was chosen to analyze all samples. The manufacture information can be classified by hierarchical clustering algorithm reasonable, provide support for making data dictionary.

     

/

返回文章
返回