: . 计算机工程与了改善 SAC(Soft Actor Critic)算法样本等概率采样以及网络随机初始化造成网络收敛速度慢、 训练过程不稳定问题,提出一种结合优先级回放和专家数据的改进算法 PE-SAC(Priority playback soft Actor Critic with expert)。该算法依据样本价值将样本池分类,使用专家数据预训练网络,缩小无人车无效探索空 间、降低试错次数,有效提升算法学习效率。同时设计一种面向多障碍物的奖励函数增强算法适用性。在 CARLA 平台进行仿真实验,结果表明所提出方法可以更好的控制无人车在环境中安全行驶,同等训练次数 下所得奖励值和收敛速度优于 TD3(Twin Delayed Deep Deterministic policy gradient algorithm)和 SAC 算法。 最后,结合雷达点云地图与 PID(Proportional Integral Derivative)控制方法缩小仿真环境与真实场景差异性, 将训练所得模型移植到园区低速无人车中验证算法泛用性。 关键词:深度强化学习;无人驾驶控制;现实场景 文献标志码:A 文献标识码:TP391 doi:.1002--0084
Application of SAC-based Autonomous Vehicle Control Method NING Qiang1, LIU Yuansheng1,2*, XIE Longyang3 of Smart City, Beijing Union University, Beijing 100101, China Engineering Research Center of Smart Mechanical Innovation Design Service, Beijing 100101, China Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China Abstract:In order to improve the problem of slow network convergence and unstable training process caused by equal probability sampling of SAC (Soft Actor Critic) algorithm samples and random initialization of the network, an improved algorithm PE-SAC (Priority playback soft Actor) is proposed that combines priority playback and expert data. Critic with expert). The algorithm classifies the sample pool according to the sample value, uses expert data to pre-train the network,