Abstract:
This paper proposed a rail transit passenger flow detection and personnel search method based on improved Grounding DINO model to address the problem of traditional object detection model being unable to meet the dynamic requirements of rail transit scenarios due to single modal detection and fixed category limitations. The paper proposed strategies such as lightweight network architecture reconstruction, scene specific adaptation training, and multi-scale feature adaptive fusion. Through network structure pruning and quantization compression of computational complexity, as well as scene adaptation based on rail transit dedicated datasets and adversarial training, it utilized dynamic weight mechanisms to enhance multi-scale object detection capabilities. The experimental results show that the improved Grounding DINO model improves the accuracy from 95.327% to 99.785% in passenger flow detection tasks, and the recall rate jumps from 45.452% to 96.711%. In personnel search and testing, the accuracy and recall rates increase by 17.91% and 89.729%, respectively. This study provides efficient technical solutions for intelligent management of rail transit, and its multimodal fusion and scene optimization strategies also provide new ideas for cross domain object detection research.