Abstract:
With the increasing access range and traffic density of China's high-speed railway, operation plan adjustment has become more complex. It is a development trend of intelligent dispatching of high-speed railway to use computer and artificial intelligence and other technical means to assist dispatchers in formulating phase adjustment plan. The adjustment of high-speed railway operation plan is a multi-stage decisions making problem, which is characterized by long decision chain, large scale and many constraints. As a result, the traditional reinforcement learning method Q-Learning algorithm has low learning efficiency and slow convergence. This paper proposed a Q(λ)-Learning based intelligent adjustment algorithm for high-speed railway operation plan, which used cumulative eligibility tracking to design a multi-step reward update mechanism, effectively solving the problem of slow convergence under sparse rewards. The track utilization scheme was fully considered in the design of the objective function, which was more suitable for coping with the use of arrival and departure lines when the traffic density increases. Simulation experiments showed that the proposed method was superior to the traditional Q-Learning algorithm in learning efficiency, convergence speed and convergence results.