deepmind如何减少Atari游戏Q值的计算?
How deepmind reduce the calculation for Q values for Atari games?
我们知道 q-learning 需要大量的计算:
对于一个游戏AI来说,它需要比OX游戏,GO游戏更多的q值。
这些大量的q值是如何计算的?
谢谢。
MCTS 实际上并没有减少 q-values 的任何计算。
对于一个非常简单的 Atari 游戏 AI,它需要的 q 值远不止 3^(19x19) 个。
查看深度q网络,解决了你的问题。
We could represent our Q-function with a neural network, that takes
the state (four game screens) and action as input and outputs the
corresponding Q-value. Alternatively we could take only game screens
as input and output the Q-value for each possible action. This
approach has the advantage, that if we want to perform a Q-value
update or pick the action with highest Q-value, we only have to do one
forward pass through the network and have all Q-values for all actions
immediately available.
https://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/
我们知道 q-learning 需要大量的计算:
对于一个游戏AI来说,它需要比OX游戏,GO游戏更多的q值。
这些大量的q值是如何计算的?
谢谢。
MCTS 实际上并没有减少 q-values 的任何计算。
对于一个非常简单的 Atari 游戏 AI,它需要的 q 值远不止 3^(19x19) 个。
查看深度q网络,解决了你的问题。
We could represent our Q-function with a neural network, that takes the state (four game screens) and action as input and outputs the corresponding Q-value. Alternatively we could take only game screens as input and output the Q-value for each possible action. This approach has the advantage, that if we want to perform a Q-value update or pick the action with highest Q-value, we only have to do one forward pass through the network and have all Q-values for all actions immediately available.
https://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/