Abstract:
With the sharp increase in freight volume of China Railway Nanning Group Co., LTD. (referred to as: Nanning Bureau) over past years, the data volume of the existing freight business information systems has grown rapidly. Since the data is scattered and stored in the independently built databases of each system, the application effect for complex freight business queries and online analysis that require cross-database is not good. To fully explore the value of freight business data assets, this article selects the data warehouse product based on the MPP cluster architecture to integrate, process and efficiently store multi-source and massive freight business data, optimizing the response time of online analysis data query from minute-level of traditional databases to second-level, laying the foundation for conducting research on data mining applications. The improved K-means clustering and Naive Bayes classification algorithms are respectively adopted to carry out the analysis of freight customer value and the prediction of missed loading. The results show that constructing a freight customer segmentation model based on the K-means algorithm can help the freight department quickly identify the value of different customers and provide a reliable basis for freight marketing to accurately identify the marketing direction and adjust the price strategy. The loading failure prediction model is constructed based on the Naive Bayes algorithm, and the prediction results is condusive to enhance the risk foreseeing capability of freight organization. The research results are conducive to promoting the transformation of Nanning Bureau's freight management from experience-driven to data-driven, and providing support for the high-quality development of freight business.