DOUBLE DQN 没有任何意义

DOUBLE DQN doesn't make any sense

为什么使用 2 个网络,每集训练一次并每 N 集更新目标网络,而我们可以使用 1 个网络并每 N[训练一次=14=] 插曲!完全没有区别!

您描述的是 不是 Double DQN. The periodically updated target network is a core feature of the original DQN algorithm (and all of its derivatives). DeepMind's classic paper 解释了为什么拥有两个网络至关重要:

The second modification to online Q-learning aimed at further improving the stability of our method with neural networks is to use a separate network for generating the targets y_j in the Q-learning update. More precisely, every C updates we clone the network Q to obtain a target network Q^ and use Q^ for generating the Q-learning targets y_j for the following C updates to Q. This modification makes the algorithm more stable compared to standard online Q-learning, where an update that increases Q(s_t, a_t) often also increases Q(s_{t+1}, a) for all a and hence also increases the target y_j, possibly leading to oscillations or divergence of the policy. Generating the targets using an older set of parameters adds a delay between the time an update to Q is made and the time the update affects the targets y_j, making divergence or oscillations much more unlikely.