使用神经网络的情景半梯度 Sarsa

Episodic Semi-gradient Sarsa with Neural Network

在尝试实现 Episodic Semi-gradient Sarsa with 神经网络作为逼近器时，我想知道如何根据当前学习的网络权重选择最佳动作。如果动作 space 是离散的，我可以只计算当前状态下不同动作的估计值，然后选择给出最大值的那个。但这似乎不是解决问题的最佳方法。此外，如果动作 space 可以是连续的（例如自动驾驶汽车的加速），则它不起作用。

所以，基本上我想知道如何解决 Sutton 伪代码中的第 10 行 Choose A' as a function of q(S', , w)：

这些问题通常是如何解决的？可以使用 Keras 推荐一个很好的算法示例吗？

编辑：使用网络作为逼近器时是否需要修改伪代码？因此，我只是简单地将网络预测的 MSE 和奖励 R 最小化？

I wondered how I choose the optimal action based on the currently learned weights of the network

您有三个基本选择：

运行网络多次，针对 A' 的每个可能值一次与 S' 您正在考虑的值。取最大值作为预测的最优动作（概率为 1-ε，否则为 SARSA 中常用的 ε-greedy policy 随机选择）
设计网络以同时估计所有动作值 - 即具有 |A(s)|输出（可能被填充以覆盖需要过滤掉的 "impossible" 操作）。这将稍微改变梯度计算，应该对最后一层非活动输出应用零梯度（即任何不匹配 (S,A)[=39 的 A 的东西=]).同样，只需将最大 valid 输出作为估计的最佳动作。这比运行网络效率高很多倍。这也是最近的DQN Atari游戏下棋和AlphaGo的策略网络所使用的方法。

使用 policy-gradient method, which works by using samples to estimate gradient that would improve a policy estimator. You can see chapter 13 of Sutton and Barto's second edition of Reinforcement Learning: An Introduction for more details. Policy-gradient methods become attractive for when there are large numbers of possible actions and can cope with continuous action spaces (by making estimates of the distribution function for optimal policy - e.g. choosing mean and standard deviation of a normal distribution, which you can sample from to take your action). You can also combine policy-gradient with a state-value approach in actor-critic methods，这比纯策略梯度方法更有效。

请注意，如果您的动作 space 是连续的，则不必使用策略梯度方法，您可以直接量化动作。此外，在某些情况下，即使动作在理论上是连续的，您可能会发现最优策略仅涉及使用极值（经典 mountain car example 属于此类，唯一有用的动作是最大加速度和最大向后加速度）

Do I need to modify the pseudo-code when using a network as the approximator? So, that I simply minimize the MSE of the prediction of the network and the reward R for example?

没有。伪代码中没有单独的损失函数，例如您会在监督学习中看到的 MSE。误差项（通常称为TD误差）由方括号中的部分给出，并达到类似的效果。从字面上看，术语 ∇q(S,A,w) （抱歉缺少帽子，SO 上没有 LaTex）表示估计器本身的梯度- 不是任何损失函数的梯度。

使用神经网络的情景半梯度 Sarsa

Episodic Semi-gradient Sarsa with Neural Network

reinforcement-learning

neural-network

sarsa