Abstract:
In order to solve the nonstandard problems of equipment manufacture information in railway communication big data platform, this paper put forward a method of intelligent classification based on cluster analysis. The paper introduced the theories of cluster analysis, similarity calculation method, and clustering performance measurement. By using the data preprocessing technologies such as word cuts, bag of words model, weight conversion, text can be converted to weight vector. The K-Mean clustering and hierarchical clustering algorithm were used to analyze some of the samples, compare the result, the hierarchical clustering algorithm was chosen to analyze all samples. The manufacture information can be classified by hierarchical clustering algorithm reasonable, provide support for making data dictionary.