deepmind如何减少Atari游戏Q值的计算?

How deepmind reduce the calculation for Q values for Atari games?

我们知道 q-learning 需要大量的计算:

对于一个游戏AI来说,它需要比OX游戏,GO游戏更多的q值。

这些大量的q值是如何计算的?

谢谢。

MCTS 实际上并没有减少 q-values 的任何计算。

对于一个非常简单的 Atari 游戏 AI,它需要的 q 值远不止 3^(19x19) 个。

查看深度q网络,解决了你的问题。

We could represent our Q-function with a neural network, that takes the state (four game screens) and action as input and outputs the corresponding Q-value. Alternatively we could take only game screens as input and output the Q-value for each possible action. This approach has the advantage, that if we want to perform a Q-value update or pick the action with highest Q-value, we only have to do one forward pass through the network and have all Q-values for all actions immediately available.

https://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/