• 查询稿件
  • 获取最新论文
  • 知晓行业信息
官方微信 欢迎关注

基于RoBERTa-BiLSTM-CRF模型的铁路货运一口价议价策略命名实体识别

杜文然, 靳征, 代明睿, 薛蕊, 吴爽

杜文然, 靳征, 代明睿, 薛蕊, 吴爽. 基于RoBERTa-BiLSTM-CRF模型的铁路货运一口价议价策略命名实体识别[J]. 铁路计算机应用, 2023, 32(5): 11-15. DOI: 10.3969/j.issn.1005-8451.2023.05.03
引用本文: 杜文然, 靳征, 代明睿, 薛蕊, 吴爽. 基于RoBERTa-BiLSTM-CRF模型的铁路货运一口价议价策略命名实体识别[J]. 铁路计算机应用, 2023, 32(5): 11-15. DOI: 10.3969/j.issn.1005-8451.2023.05.03
DU Wenran, JIN Zheng, DAI Mingrui, XUE Rui, WU Shuang. Named entity recognition of railway freight competitive pricing strategy based on RoBERTa-BiLSTM-CRF model[J]. Railway Computer Application, 2023, 32(5): 11-15. DOI: 10.3969/j.issn.1005-8451.2023.05.03
Citation: DU Wenran, JIN Zheng, DAI Mingrui, XUE Rui, WU Shuang. Named entity recognition of railway freight competitive pricing strategy based on RoBERTa-BiLSTM-CRF model[J]. Railway Computer Application, 2023, 32(5): 11-15. DOI: 10.3969/j.issn.1005-8451.2023.05.03

基于RoBERTa-BiLSTM-CRF模型的铁路货运一口价议价策略命名实体识别

基金项目: 中国国家铁路集团有限公司科技研究开发计划重点课题(N2021S006)
详细信息
    作者简介:

    杜文然,研究实习员

    靳 征,审计师

  • 中图分类号: U294 : F532.5 : TP39

Named entity recognition of railway freight competitive pricing strategy based on RoBERTa-BiLSTM-CRF model

  • 摘要: 为提升铁路货运审计工作的效率,针对铁路货运一口价议价策略(简称:一口价策略)的文本数据,设计了基于数据增强的RoBERTa(Robustly optimized Bidirectional En­coder Representation from Transformers)-BiLSTM(Bidrectional Long Short Term Memory)-CRF(Conditional Random Field)模型,介绍了数据标注策略,详细阐述了模型的总体架构和样本数据增强方法。对所设计的模型进行了应用验证,验证结果表明, RoBERTa-BiLSTM-CRF模型对一口价策略中命名实体识别的各项性能评价指标较其他2种传统模型均有显著提高,能够更准确地识别一口价策略中的命名实体信息,辅助铁路货运审计人员的审计工作。
    Abstract: In order to improve the efficiency of railway freight audit work, this paper focused on the text data of railway freight competitive pricing strategy and designed a RoBERTa (Robustly optimized Bidirectional En­coder Representation from Transformers) -BiLSTM (Bidrectional Long Short Term Memory) -CRF (Conditional Random Field) model based on data augmentation, introduced the data annotation strategy and elaborated on the overall architecture of the model and the sample data enhancement method, conducted application validation on the designed model. The validation results show that the performance evaluation indicators of named entity recognition in railway freight competitive pricing strategy of the RoBERTa-BiLSTM-CRF model are significantly improved compared to the other two traditional models, which can more accurately identify named entity information in the railway freight competitive pricing strategy and assist railway freight auditors in their audit work.
  • 图  1   RoBERTa-BiLSTM-CRF模型架构

    图  2   RoBERTa层架构

    图  3   BiLSTM层络架构

    图  4   训练过程损失函数曲线

    表  1   一口价策略命名实体信息列表

    序号实体名称标注标识序号实体名称标注标识
    1项目号N(Number)7新增发站站名NS(New-Start)
    2托运人P(People)8取消发站站名CS(Cancel-Start)
    3价差系数C(Coefficient)9到站站名A(Arrive)
    4考核有效期T(Time)10新增到站站名NA(New-Arrive)
    5考核运量F(Freight)11取消到站站名CA(Cancel-Arrive)
    6发站站名S(Start)
    下载: 导出CSV

    表  2   模型评价指标对比

    模型名称PRF
    BiLSTM-CRF89.38%90.10%89.74%
    BERT-BiLSTM-CRF91.15%90.29%90.72%
    RoBERTa-BiLSTM-CRF94.69%92.52%93.59%
    下载: 导出CSV

    表  3   基于不同训练样本的模型的评价指标对比

    训练样本PRF
    未增强87.61%88.89%88.25%
    已增强94.69%92.52%93.59%
    下载: 导出CSV
  • [1] 黄永亮,吴志伟. 基于比价的铁路货运一口价策略研究 [J]. 铁路计算机应用,2021,30(8):24-28. DOI: 10.3969/j.issn.1005-8451.2021.08.05
    [2]

    Lafferty J D, McCallum A K, Pereira F C N. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the Eighteenth International Conference on Machine Learning, 28 June, 2001, San Francisco, USA. New York, USA: Morgan Kaufmann Publishers Inc. , 2001: 282-289.

    [3] HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[EB/OL].(2015-08-09)[2022-11-04]. https://arxiv.org/abs/1508.01991.
    [4] Ma X Z, Hovy E. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 7-12 August, 2016, Berlin, Germany. Association for Computational Linguistics, 2016. 1064-1074.
    [5]

    Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 4 December, 2017, Long Beach, USA. Red Hook, USA: Curran Associates Inc. , 2017: 6000-6010.

    [6] 汪 政, 张 勇, 金丽丽, 等. 航变信息提取方法及系统: 中国, CN108595430A[P]. 2018-09-28.
    [7] 杨 祎,崔其会,丁奕齐. 面向电网设备故障报告的半监督命名实体识别方法 [J]. 计算机应用,2021,41(S2):41-47.
    [8] 李 韧,李 童,杨建喜,等. 基于Transformer-BiLSTM-CRF的桥梁检测领域命名实体识别 [J]. 中文信息学报,2021,35(4):83-91. DOI: 10.3969/j.issn.1003-0077.2021.04.012
    [9] 赵瑞晨. 基于深度学习的铁路设备事故数据挖掘与分析[D]. 北京: 北京交通大学, 2020.
    [10] 李新琴,史天运,李 平,等. 基于文本的高速铁路信号设备故障知识抽取方法研究 [J]. 铁道学报,2021,43(3):92-100. DOI: 10.3969/j.issn.1001-8360.2021.03.012
    [11] 杨连报, 王同军, 李新琴, 等. 一种铁路文本命名实体识别方法及装置: 中国, CN111191452A[P]. 2020-05-22.
    [12] 靳 征. 铁路货运数字化审计分析模型构建探讨 [J]. 铁道经济研究,2022(5):10-13. DOI: 10.3969/j.issn.1004-9746.2022.05.003
    [13] LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach [ EB / OL] . ( 2019-07-26) [2022:11-04] .https://doi. org / 10. 48550 / arXiv. 1907. 11692.
    [14] Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 2-7 June, 2019, Minneapolis, Minnesota. Association for Computational Linguistics, 2018: 4171-4186.
  • 期刊类型引用(2)

    1. 史永胜,容璇. 基于特征匹配的飞机外表面缺陷差异检测研究. 航空计算技术. 2023(01): 6-10 . 百度学术
    2. 杨凯,张淼,祁苗苗. 铁路车辆监测图像识别模型训练及验证平台研究. 铁路计算机应用. 2023(06): 26-30 . 本站查看

    其他类型引用(3)

图(4)  /  表(3)
计量
  • 文章访问数:  140
  • HTML全文浏览量:  82
  • PDF下载量:  16
  • 被引次数: 5
出版历程
  • 收稿日期:  2022-11-03
  • 网络出版日期:  2023-05-28
  • 刊出日期:  2023-05-24

目录

    /

    返回文章
    返回