• 查询稿件
  • 获取最新论文
  • 知晓行业信息
官方微信 欢迎关注

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

面向铁路客运站场景的语音降噪模型研究

高志强 戴琳琳 景辉 王心雨

高志强, 戴琳琳, 景辉, 王心雨. 面向铁路客运站场景的语音降噪模型研究[J]. 铁路计算机应用, 2023, 32(2): 7-12. doi: 10.3969/j.issn.1005-8451.2023.02.02
引用本文: 高志强, 戴琳琳, 景辉, 王心雨. 面向铁路客运站场景的语音降噪模型研究[J]. 铁路计算机应用, 2023, 32(2): 7-12. doi: 10.3969/j.issn.1005-8451.2023.02.02
GAO Zhiqiang, DAI Linlin, JING Hui, WANG Xinyu. Speech noise reduction model for railway passenger station scene[J]. Railway Computer Application, 2023, 32(2): 7-12. doi: 10.3969/j.issn.1005-8451.2023.02.02
Citation: GAO Zhiqiang, DAI Linlin, JING Hui, WANG Xinyu. Speech noise reduction model for railway passenger station scene[J]. Railway Computer Application, 2023, 32(2): 7-12. doi: 10.3969/j.issn.1005-8451.2023.02.02

面向铁路客运站场景的语音降噪模型研究

doi: 10.3969/j.issn.1005-8451.2023.02.02
基金项目: 中国国家铁路集团有限公司科技研究开发计划(P2020X001)
详细信息
    作者简介:

    高志强,工程师

    戴琳琳,高级工程师

  • 中图分类号: U291.61 : TN912.3 : TP39

Speech noise reduction model for railway passenger station scene

  • 摘要: 为进一步提升铁路客运站嘈杂环境下的语音识别效果,文章提出一种基于Conformer的语音降噪模型ConformerGAN。其训练流程类似生成对抗网络,生成器采用Conformer进行语音特征提取,对特征建模;鉴别器使用代理评估函数对语音感知进行质量评价。为增强模型的泛化能力并提高模型对未知噪声的降噪能力,在噪声的叠加上采用随机截取片段融入的方式,并构建铁路客运站场景噪声数据集。与语音降噪相关模型效果对比的结果表明,ConformerGAN模型可将客观语音质量评估(PESQ,Perceptual Evaluation of Speech Quality)分数提高0.19,有效提高铁路客运站嘈杂环境下的语音识别准确率,改善铁路旅客语音交互体验。
  • 图  1  MetircGAN+模型训练流程

    图  2  Conformer结构示意

    图  3  ConformerGAN模型结构

    图  4  降噪前音频语图

    图  5  降噪后音频语图

    表  1  实验环境配置

    实验环境配置
    操作系统Linux
    CPU型号Inter® Xeon®CPU E5-2698 v4 @2.20 GHz
    GPU型号Tesla V100
    运行内存251 GB
    编程语言Python
    算法框架Pytorch
    下载: 导出CSV

    表  2  模型测评结果

    模型PESQCSIGCBAKCOVL
    MetricGAN+3.104.023.113.43
    ConformerGAN(N=2)3.214.303.363.71
    ConformerGAN(N=4)3.294.553.573.81
    ConformerGAN(N=12)3.284.403.553.78
    下载: 导出CSV

    表  3  车站智能服务机器人语音降噪效果

    语音背景噪声类型PESQCER(降噪前)CER(降噪后)
    站内广播3.1015.6710.35
    人工服务台3.2014.3210.27
    检票口3.2211.689.44
    下载: 导出CSV
  • [1] 王 芳,刘祖润,吴海辉. 基于软硬阈值折中的小波包语音增强算法的研究 [J]. 铁路计算机应用,2010,19(7):8-10. doi:  10.3969/j.issn.1005-8451.2010.07.003
    [2] 闫昭宇,王 晶. 结合深度卷积循环网络和时频注意力机制的单通道语音增强算法 [J]. 信号处理,2020,36(6):863-870. doi:  10.16798/j.issn.1003-0530.2020.06.007
    [3] 袁文浩,胡少东,时云龙,等. 一种用于语音增强的卷积门控循环网络 [J]. 电子学报,2020,48(7):1276-1283. doi:  10.3969/j.issn.0372-2112.2020.07.005
    [4] Riedmiller M. Advanced supervised learning in multi-layer perceptrons—from backpropagation to adaptive learning algorithms [J]. Computer Standards & Interfaces, 1994, 16(3): 265-278.
    [5] Xu Y, Du J, Dai L R, et al. A regression approach to speech enhancement based on deep neural networks [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(1): 7-19. doi:  10.1109/TASLP.2014.2364452
    [6] Albawi S, Mohammed T A, Al-Zawi S. Understanding of a convolutional neural network[C]//2017 international conference on engineering and technology (ICET), 21-23 August, 2017, Antalya, Turkey. New York, USA: IEEE, 2017: 1-6.
    [7] Deng L, Yu D. Deep learning: methods and applications [J]. Foundations and Trends® in Signal Processing, 2014, 7(3-4): 197-387.
    [8] Sun L, Du J, Dai L R, et al. Multiple-target deep learning for LSTM-RNN based speech enhancement[C]//2017 Hands-free Speech Communications and Microphone Arrays, 1-3 March, 2017, San Francisco, CA, USA. New York: IEEE, 2017: 136-140.
    [9] Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems, 8-13 December, 2014, Montreal Canada. Cambridge, USA: MIT Press, 2014: 2672-2680.
    [10] Fu S W, Liao C F, Tsao Y, et al. MetricGAN: generative adversarial networks based black-box metric scores optimization for speech enhancement[C]//Proceedings of the 36th International Conference on Machine Learning, 9-15 June, 2019, Long Beach, USA. PMLR, 2019: 2031-2041.
    [11] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 4-9 December, 2017, Long Beach, USA. Red Hook: Curran Associates Inc. , 2017: 6000-6010.
    [12] Gulati A, Qin J, Chiu C C, et al. Conformer: convolution-augmented transformer for speech recognition[C]//Proceedings of the 21st Annual Conference of the International Speech Communication Association, 25-29 October, 2020, Shanghai, China. ISCA, 2020: 5036-5040.
    [13] Chen S Y, Wu Y, Chen Z, et al. Continuous speech separation with conformer[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6-11 June, 2021, Toronto, ON, Canada. New York, USA: IEEE, 2021: 5749-5753.
    [14] Fu S W, Yu C, Hsieh T A, et al. MetricGAN+: an improved version of metricGAN for speech enhancement[C]//Proceedings of the 22nd Annual Conference of the International Speech Communication Association, 30 August - 3 September, 2021, Brno, Czechia. ISCA, 2021: 201-205.
    [15] Shi X J, Chen Z R, Wang H, et al. Convolutional LSTM network: a machine learning approach for precipitation nowcasting[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, 7-12 December, 2015, Montreal, Canada. Cambridge, USA: MIT Press, 2015: 802-810.
图(5) / 表(3)
出版历程
  • 收稿日期:  2022-09-08
  • 刊出日期:  2023-02-25

目录

    /

    返回文章
    返回