基于检索增强的铁路行业知识库问答系统设计与实现

骆铭; 林铭; 刘思行; 王立浩

doi:10.3969/j.issn.1005-8451.2025.12.07

基于检索增强的铁路行业知识库问答系统设计与实现

Question and answer system for railway industry knowledge base based on retrieval-augmented

摘要

摘要: 针对铁路行业非结构化文本资料管理复杂度较高、传统检索方式效率较低等问题，设计并实现了基于检索增强的铁路行业知识库问答系统。文章结合LangChain和Flask框架技术搭建该系统的整体架构；依据既定规则对文本数据进行清洗，采用动态文本分割算法构建行业向量知识库，相较于传统文本分割算法，该算法能保留更多的上下文语义信息；集成Milvus向量数据库和BGE M3-Embedding模型的向量及关键词混合检索能力，再通过元数据过滤，实现高效、精准的文本检索功能。该系统通过本地部署国产开源大语言模型，并引入提示词工程，实现高效的知识检索与智能交互，助力铁路数智化转型。

Abstract: In response to the high complexity of unstructured text data management and low efficiency of traditional retrieval methods in the railway industry, this paper designed and implemented a question-and-answer system for the railway industry knowledge base based on Retrieval-Augmented Generation (RAG). The paper integrated LangChain and Flask framework technologies to build the system’s overall architecture. It cleaned the text data according to predefined rules, and constructed a railway industry-specific vector knowledge base using a dynamic contextual text segmentation algorithm. Compared with traditional text segmentation algorithms, this proposed algorithm preserves more contextual semantic information. The paper integrated the Milvus vector database and the BGE M3 Embedding model to achieve hybrid vector-keyword retrieval capability, and then implemented an efficient and accurate text retrieval function through metadata filtering. The system implements efficient knowledge retrieval and intelligent interaction by deploying a domestically developed open-source Large Language Model (LLM) locally and introducing prompt engineering, thus facilitating the digital and intelligent transformation of the railway industry.

HTML全文

参考文献(8)

施引文献

资源附件(0)