• 查询稿件
  • 获取最新论文
  • 知晓行业信息

强化学习算法在高速铁路运营调度中的应用

Application of reinforcement learning algorithm in high speed railway operation scheduling

  • 摘要: 随着我国高速铁路(简称:高铁)通达范围和行车密度的不断提高,运行计划调整日趋复杂,利用计算机和人工智能等技术手段辅助调度员制定阶段调整计划是高铁智能调度的发展趋势。高铁运行计划调整问题是一个多阶段决策问题,具有决策链长、规模大、约束多等特点,导致传统的强化学习方法Q学习算法的学习效率低、收敛缓慢。文章提出一种基于Q(λ)学习的高铁运行计划智能调整算法,采用累积式资格迹设计多步奖励更新机制,有效解决稀疏奖励下收敛慢的问题,目标函数设计中充分考虑了股道运用计划,更适合反应行车密度增大时到发线的使用情况。仿真实验表明,Q(λ)学习算法在学习效率、收敛速度和收敛结果上均优于传统的Q学习算法。

     

    Abstract: With the increasing access range and traffic density of China's high-speed railway, operation plan adjustment has become more complex. It is a development trend of intelligent dispatching of high-speed railway to use computer and artificial intelligence and other technical means to assist dispatchers in formulating phase adjustment plan. The adjustment of high-speed railway operation plan is a multi-stage decisions making problem, which is characterized by long decision chain, large scale and many constraints. As a result, the traditional reinforcement learning method Q-Learning algorithm has low learning efficiency and slow convergence. This paper proposed a Q(λ)-Learning based intelligent adjustment algorithm for high-speed railway operation plan, which used cumulative eligibility tracking to design a multi-step reward update mechanism, effectively solving the problem of slow convergence under sparse rewards. The track utilization scheme was fully considered in the design of the objective function, which was more suitable for coping with the use of arrival and departure lines when the traffic density increases. Simulation experiments showed that the proposed method was superior to the traditional Q-Learning algorithm in learning efficiency, convergence speed and convergence results.

     

/

返回文章
返回