Abstract:
To solve the problem of low retrieval efficiency in railway lost item searching, this paper proposed a railway lost item retrieval method based on cross-modal image-text retrieval. To address the poor adaptability of existing methods to open-source datasets, the paper constructed a dedicated dataset for railway lost item retrieval. Based on the CLIP (Contrastive Language-Image Pretraining) model, the paper fine-tuned the model by integrating techniques such as rsLoRA (rank stabilized LoRA), FLIP (Feature-level Image Precipitation), and GAT (Global Adversarial Training), and introduced bidirectional reordering and model fusion strategies to optimize retrieval accuracy. The experimental results show that the mean recall rate of the proposed method reaches 87.01%, the R@1 metric is improved to 68.4%, and the memory occupancy rate is reduced by 54%, outperforming the baseline method by a significant margin. This method provides an efficient technical solution for railway lost item retrieval in practical applications.