基于跨模态图文检索的铁路遗失物品查找方法研究

王克达; 刘飞

doi:10.3969/j.issn.1005-8451.2025.12.04

基于跨模态图文检索的铁路遗失物品查找方法研究

王克达,
刘飞

Railway lost item search method based on cross-modal image-text retrieval method

WANG Keda,
LIU Fei

摘要

摘要: 为解决铁路遗失物品查找效率低的问题，提出了一种基于跨模态图文检索的铁路遗失物品查找方法。针对开源数据集适配性不足的问题，构建数据集；以CLIP（Contrastive Language-Image Pre-training）模型为基础，结合rsLoRA（rank-stabilized LoRA）、FLIP（Feature-level Image Perturbation）和GAT（Global Adversarial Training）等技术进行微调，并引入双向重排序和模型融合策略来优化检索精度。实验结果表明，所提方法的平均召回率达到87.01%，Recall@1提升至68.4%，显存占用率降低54%，显著优于基线方法，可为铁路遗失物品查找提供高效技术方案。

Abstract: To solve the problem of low retrieval efficiency in railway lost item searching, this paper proposed a railway lost item retrieval method based on cross-modal image-text retrieval. To address the poor adaptability of existing methods to open-source datasets, the paper constructed a dedicated dataset for railway lost item retrieval. Based on the CLIP (Contrastive Language-Image Pretraining) model, the paper fine-tuned the model by integrating techniques such as rsLoRA (rank stabilized LoRA), FLIP (Feature-level Image Precipitation), and GAT (Global Adversarial Training), and introduced bidirectional reordering and model fusion strategies to optimize retrieval accuracy. The experimental results show that the mean recall rate of the proposed method reaches 87.01%, the R@1 metric is improved to 68.4%, and the memory occupancy rate is reduced by 54%, outperforming the baseline method by a significant margin. This method provides an efficient technical solution for railway lost item retrieval in practical applications.

HTML全文

参考文献(9)

施引文献

资源附件(0)