Abstract:
In response to the problems of image degradation, lack of learnable features, and scarcity of abnormal samples in railway trackside cable inspection images, this paper proposed a multimodal large model EGA-CLIP that introduced an Edge Guided Attention (EGA) module into the Contrastive Language Image Pretraining (CLIP) architecture, as well as a method for appearance anomaly detection of trackside cable trough based on EGA-CLIP multimodal large model. It designed an anomaly detection process that combined Contrast Limited Adaptive Histogram Equalization (CLAHE) enhancement, YOLO (You Only Look Once) v11 localization, and Gaussian filtering to optimize input image quality, enhanced the structural perception ability through multi-scale fusion of Canny Sobel edge features and visual Transformer features, and generated anomaly segmentation maps. The experimental results show that EGA-CLIP achieves 99.00%, 89.52%, and 99.19% in pixel level receiver operating characteristic curve area (Pixel AUROC), image level receiver operating characteristic curve area (Image AUROC), and accuracy, respectively, which is superior to the comparison model. It has strong generalization ability in few sample scenarios and can provide a reliable solution for railway trackside equipment detection.