Abstract:
To implement precise matching between textual and image information of lost railway items, improve their recovery rate, realize cost reduction and efficiency improvement in railway passenger transportation, and enhance the quality and management level of its services, this paper proposed a retrieval method for lost railway items. The method first combined the images and textual descriptions of lost items, extracted their key features via information extraction technology, and subsequently employed Chinese-CLIP (Contrastive Language-Image Pretraining) multimodal image-text retrieval technology to map both visual and textual information into a unified semantic space. Trained on real-world data, the method efficiently filtered out the most relevant results from a large volume of lost item registration records, achieving accurate cross-modal matching. Experimental results show that in the lost item retrieval task, this method outperforms traditional approaches significantly, effectively improving retrieval accuracy and response speed, and thus better meeting the needs of passengers and management personnel for efficient and convenient retrieval of lost items.