强化学习中奖励政策的重要性是什么?

What is importance of reward policy in Reinforcement learninig?

我们为达到目标分配 +1 奖励,为达到不需要的状态分配 -1。

是否有必要对采取接近目标的行动给予 +0.01 奖励,对未达到目标的行动给予 -0.01 奖励?

上述奖励政策会有哪些重大变化?

来自 Sutton 和 Barto 的书,Section 3.2 Goals and Rewards

It is thus critical that the rewards we set up truly indicate what we want accomplished. In particular, the reward signal is not the place to impart to the agent prior knowledge about how to achieve what we want it to do.3.4For example, a chess- playing agent should be rewarded only for actually winning, not for achieving subgoals such taking its opponent's pieces or gaining control of the center of the board. If achieving these sorts of subgoals were rewarded, then the agent might find a way to achieve them without achieving the real goal. For example, it might find a way to take the opponent's pieces even at the cost of losing the game. The reward signal is your way of communicating to the robot what you want it to achieve, not how you want it achieved.

因此,一般来说,避免通过奖励函数引入先验知识是个好主意,因为它可能会产生不良结果。

但是,众所周知,通过奖励函数指导代理学习过程可以提高 RL 性能。事实上,在一些复杂的任务中,有必要首先将代理引导到次要(更容易)的目标,然后改变奖励以学习主要目标。这种技术被称为 reward shaping。在 Randløv 和 Alstrøm 的论文中可以找到一个古老但有趣的例子:Learning to Drive a Bicycle using Reinforcement Learning and Shaping.