Risk user identification based on horizontal federated learning
-
摘要:
第三方平台推出的各种铁路旅客抢票服务,给中国铁路12306互联网售票系统(简称:12306)带来了较大压力,为保障12306的稳定性和旅客购票的公平性,亟需对风险用户进行识别。为应对因12306部署在不同的物理位置、不同中心的数据聚合存在一定风险的情况,研究在用户数据分散的条件下,基于横向联邦学习的风险用户识别方法。文章基于用户的访问行为,构建和提取用户特征,构建基于XGboost、逻辑回归和神经网络等算法的横向联邦学习模型,并进行模型验证。实验结果表明,基于XGboost算法的横向联邦学习模型具有较好的风险用户识别效果,为铁路数据的安全使用提供了技术支撑。
-
关键词:
- 12306互联网售票系统 /
- 横向联邦学习 /
- XGBoost /
- 风险用户 /
- 神经网络
Abstract:Various railway passenger ticket grabbing services launched by the third-party platform have brought great pressure to the China railway 12306 Internet ticketing and reservation system (12306 for short). In order to ensure the stability of 12306 and the fairness of passenger ticket purchase, it is urgent to identify risk users. This paper aimed to address the risk of data aggregation caused by the deployment of 12306 in different physical locations and centers, studied a risk user identification method based on horizontal federated learning under the condition of dispersed user data. Based on user access behavior, the paper constructed and extracted user features, constructed a horizontal federated learning model using algorithms such as XGboost, logistic regression, and neural networks, and validated the model. The experimental results show that the horizontal federated learning model based on XGboost algorithm has good risk user recognition performance, provides technical support for the safe use of railway data.
-
-
表 1 Fed_lr模型部分参数的权重
参数 权重 Intercept(常变量) -1.978 len_full - 1.03212 min_dur - 0.94088 len_uniq - 0.89904 getwaittime_num_5min - 0.37734 confirmpassengerinfosingle_num_15min - 0.29207 url3 - 0.28003 querypassenger_num_5min - 0.24953 表 2 一中心数据集的指标结果
模型 AUC F1-score Accuracy Recall Precesion Fed_XGb 0.9856 0.9061 0.9591 0.9490 0.8670 Fed_lr 0.9545 0.7948 0.9036 0.8981 0.7129 Fed_nn 0.9550 0.8130 0.9102 0.9393 0.7167 XGBoost 0.9868 0.8828 0.9510 0.9389 0.8330 表 3 二中心数据集的指标结果
模型 AUC F1-score Accuracy Recall Precesion Fed_XGb 0.9837 0.8715 0.9444 0.9491 0.8056 Fed_lr 0.9609 0.8168 0.9186 0.9135 0.7387 Fed_nn 0.9680 0.8598 0.9394 0.9364 0.7948 XGBoost 0.9825 0.8543 0.9335 0.9443 0.7800 -
[1] 李 雯,朱建生,单杏花. 基于指数权重算法的铁路互联网售票异常用户智能识别的研究与实现[J]. 铁路计算机应用,2018,27(10):7-10, DOI: 10.3969/j.issn.1005-8451.2018.10.002. [2] Fan C M, Li W, Zhu Y T, et al. Anomaly access detection method based on multi-channel data[C]//Proceedings of the IEEE 5th International Conference on Cloud Computing and Big Data Analytics, 10-13 April, 2020, Chengdu, China. New York, USA: IEEE, 2020. 295-300.
[3] Wang J Q, He X L, Gong Q Y, et al. Deep learning-based malicious account detection in the Momo social network[C]//Proceedings of the 27th International Conference on Computer Communication and Networks (ICCCN), 30 July - 2 August, 2018, Hangzhou, China. New York, USA: IEEE, 2018. 1-2.
[4] Zhang Y, Chen W L, Yeo C K, et al. Detecting rumors on online social networks using multi-layer autoencoder[C]//Proceedings of 2017 IEEE Technology & Engineering Management Conference (TEMSCON), 8-10 June, 2017, Santa Clara, CA, USA. New York, USA: IEEE, 2017. 437-441.
[5] Sun X, Zhang C, Ding S, et al. Detecting anomalous emotion through big data from social networks based on a deep learning method[J]. Multimedia Tools and Applications, 2020, 79(13-14): 9687. DOI: 10.1007/s11042-018-5665-6
[6] 卫新乐,张志勇,宋 斌,等. 基于纵向联邦学习的社交网络跨平台恶意用户检测方法[J]. 小型微型计算机系统,2022,43(7):1541-1546, DOI: 10.20009/j.cnki.21-1106/TP.2020-1108. [7] Mcmahan H B, Moore E, Ramage D, et al. Federated learning of deep networks using model averaging[DB/OL]. https://arxiv.org/abs/1602.05629, 2017.
[8] Konen J, Mcmahan H B, Ramage D, et al. Federated optimization:distributed machine learning for on-device intelligence[DB/OL]. [2024-05-31]. https://arxiv.org/abs/1610.02527, 2016.
[9] Yang Q, Liu Y, Chen T J, et al. Federated machine learning: Concept and applications[J]. ACM Transactions on Intelligent Systems and Technology, 2019, 10(2): 12.
[10] 陈 涛,郭 睿,刘志强. 面向大数据隐私保护的联邦学习算法航空应用模型研究[J]. 信息安全与通信保密,2020(9):75-84. DOI: 10.3969/j.issn.1009-8054.2020.09.010 [11] 李 国,张秋杰. 基于纵向联邦学习的航班延误预测[J]. 计算机工程与设计,2023,44(5):1594-1601. [12] Liu Y, Yu J J Q, Kang J W, et al. Privacy-preserving traffic flow prediction: A federated learning approach[J]. IEEE Internet of Things Journal, 2020, 7(8): 7751-7763. DOI: 10.1109/JIOT.2020.2991401
[13] William Marfo, William Marfo, Shirley V. Moore. Network Anomaly Detection Using Federated Learning[DB/OL]. [2024-05-31]. https://arxiv.org/abs/2303.07452, 2023.
[14] 赵 英,王丽宝,陈骏君,等. 基于联邦学习的网络异常检测[J]. 北京化工大学学报(自然科学版),2021,48(2):92-99. [15] 刘金硕,詹岱依,邓 娟,等. 基于深度神经网络和联邦学习的网络入侵检测[J]. 计算机工程,2023,49(1):15-21,30. [16] 王 楠,张大林,刘娟. 一种基于联邦学习的风险权重融合的异常检测方法:中国,202111362361.7[P]. 2022-04-15. [17] 曾闽川,方 勇,许益家. 基于联邦迁移学习的应用系统日志异常检测研究[J]. 四川大学学报(自然科学版),2023,60(3):79-86. [18] 张泽辉,李庆丹,富 瑶,等. 面向非独立同分布数据的自适应联邦深度学习算法[J]. 自动化学报,2023,49(12):2493-2506, DOI: 10.16383/j.aas.c201018. [19] 曲 强,于洪涛,黄瑞阳. 社交网络异常用户检测技术研究进展[J]. 网络与信息安全学报,2018,4(3):13-23.