卷积神经网络:如何训练它? (无监督)
Convolutional neural network: how to train it? (unsupervised)
我正在尝试实现 CNN 来玩游戏。
我正在使用 python 和 theano/lasagne。我已经建立了网络,现在正在弄清楚如何训练它。
所以现在我有一批 32 个状态,对于该批次中的每个状态,action 和该操作的 expected rewards .
现在我该如何训练网络,使其了解这些状态下的这些行为会导致这些奖励?
编辑:澄清我的问题。
这是我的完整代码:http://pastebin.com/zY8w98Ng
蛇导入:http://pastebin.com/fgGCabzR
我遇到了这个问题:
def _train(self):
# Prepare Theano variables for inputs and targets
input_var = T.tensor4('inputs')
target_var = T.ivector('targets')
states = T.tensor4('states')
print "sampling mini batch..."
# sample a mini_batch to train on
mini_batch = random.sample(self._observations, self.MINI_BATCH_SIZE)
# get the batch variables
previous_states = [d[self.OBS_LAST_STATE_INDEX] for d in mini_batch]
actions = [d[self.OBS_ACTION_INDEX] for d in mini_batch]
rewards = [d[self.OBS_REWARD_INDEX] for d in mini_batch]
current_states = np.array([d[self.OBS_CURRENT_STATE_INDEX] for d in mini_batch])
agents_expected_reward = []
# print np.rollaxis(current_states, 3, 1).shape
print "compiling current states..."
current_states = np.rollaxis(current_states, 3, 1)
current_states = theano.compile.sharedvalue.shared(current_states)
print "getting network output from current states..."
agents_reward_per_action = lasagne.layers.get_output(self._output_layer, current_states)
print "rewards adding..."
for i in range(len(mini_batch)):
if mini_batch[i][self.OBS_TERMINAL_INDEX]:
# this was a terminal frame so need so scale future reward...
agents_expected_reward.append(rewards[i])
else:
agents_expected_reward.append(
rewards[i] + self.FUTURE_REWARD_DISCOUNT * np.max(agents_reward_per_action[i].eval()))
# figure out how to train the model (self._output_layer) with previous_states,
# actions and agent_expected_rewards
我想使用 previous_states、动作和 agent_expected_rewards 更新模型,以便它了解这些动作会带来那些奖励。
我希望它看起来像这样:
train_model = theano.function(inputs=[input_var],
outputs=self._output_layer,
givens={
states: previous_states,
rewards: agents_expected_reward
expected_rewards: agents_expected_reward)
我只是不明白给定值会如何影响模型,因为在构建网络时我没有指定它们。我也无法在 theano 和 lasagne 文档中找到它。
那么我怎样才能更新 model/network 以便它 'learns'.
如果仍然不清楚,请评论还需要哪些信息。这几天我一直在努力解决这个问题。
在查阅文档后,我终于找到了答案。之前找错地方了
network = self._output_layer
prediction = lasagne.layers.get_output(network)
loss = lasagne.objectives.categorical_crossentropy(prediction, target_var)
loss = loss.mean()
params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.sgd(loss, params, self.LEARN_RATE)
givens = {
states: current_states,
expected: agents_expected_reward,
real_rewards: rewards
}
train_fn = theano.function([input_var, target_var], loss,
updates=updates, on_unused_input='warn',
givens=givens,
allow_input_downcast='True')
train_fn(current_states, agents_expected_reward)
我正在尝试实现 CNN 来玩游戏。 我正在使用 python 和 theano/lasagne。我已经建立了网络,现在正在弄清楚如何训练它。
所以现在我有一批 32 个状态,对于该批次中的每个状态,action 和该操作的 expected rewards .
现在我该如何训练网络,使其了解这些状态下的这些行为会导致这些奖励?
编辑:澄清我的问题。
这是我的完整代码:http://pastebin.com/zY8w98Ng 蛇导入:http://pastebin.com/fgGCabzR
我遇到了这个问题:
def _train(self):
# Prepare Theano variables for inputs and targets
input_var = T.tensor4('inputs')
target_var = T.ivector('targets')
states = T.tensor4('states')
print "sampling mini batch..."
# sample a mini_batch to train on
mini_batch = random.sample(self._observations, self.MINI_BATCH_SIZE)
# get the batch variables
previous_states = [d[self.OBS_LAST_STATE_INDEX] for d in mini_batch]
actions = [d[self.OBS_ACTION_INDEX] for d in mini_batch]
rewards = [d[self.OBS_REWARD_INDEX] for d in mini_batch]
current_states = np.array([d[self.OBS_CURRENT_STATE_INDEX] for d in mini_batch])
agents_expected_reward = []
# print np.rollaxis(current_states, 3, 1).shape
print "compiling current states..."
current_states = np.rollaxis(current_states, 3, 1)
current_states = theano.compile.sharedvalue.shared(current_states)
print "getting network output from current states..."
agents_reward_per_action = lasagne.layers.get_output(self._output_layer, current_states)
print "rewards adding..."
for i in range(len(mini_batch)):
if mini_batch[i][self.OBS_TERMINAL_INDEX]:
# this was a terminal frame so need so scale future reward...
agents_expected_reward.append(rewards[i])
else:
agents_expected_reward.append(
rewards[i] + self.FUTURE_REWARD_DISCOUNT * np.max(agents_reward_per_action[i].eval()))
# figure out how to train the model (self._output_layer) with previous_states,
# actions and agent_expected_rewards
我想使用 previous_states、动作和 agent_expected_rewards 更新模型,以便它了解这些动作会带来那些奖励。
我希望它看起来像这样:
train_model = theano.function(inputs=[input_var],
outputs=self._output_layer,
givens={
states: previous_states,
rewards: agents_expected_reward
expected_rewards: agents_expected_reward)
我只是不明白给定值会如何影响模型,因为在构建网络时我没有指定它们。我也无法在 theano 和 lasagne 文档中找到它。
那么我怎样才能更新 model/network 以便它 'learns'.
如果仍然不清楚,请评论还需要哪些信息。这几天我一直在努力解决这个问题。
在查阅文档后,我终于找到了答案。之前找错地方了
network = self._output_layer
prediction = lasagne.layers.get_output(network)
loss = lasagne.objectives.categorical_crossentropy(prediction, target_var)
loss = loss.mean()
params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.sgd(loss, params, self.LEARN_RATE)
givens = {
states: current_states,
expected: agents_expected_reward,
real_rewards: rewards
}
train_fn = theano.function([input_var, target_var], loss,
updates=updates, on_unused_input='warn',
givens=givens,
allow_input_downcast='True')
train_fn(current_states, agents_expected_reward)