Abstract:
In order to further improve the speech recognition effect in the noisy environment of the station, this paper proposed a Conformer based generative adjunctive network Conformer Generative Adversarial Network (GAN) for speech noise reduction. Its training process was similar to GAN, generator used the Conformer to extract speech features and model them; discriminator constructed a proxy evaluation function to evaluate the perceptual quality of speech. In order to enhance the generalization ability of the model and improve the noise reduction ability of the model for unknown noise, the overlay of noise was incorporated by randomly intercepting fragments. The paper also built a station scene noise dataset. Compared with the effect of related models, the ConformierGAN model can improve the Perceptual Evaluation of Speech Quality (PESQ) score by 0.19, effectively improve the accuracy of voice recognition in the noisy environment of railway passenger stations, and improve the voice interaction experience of railway passengers.