随机更改 for 循环值
Randomly changing for loop values
空闲时间一直在做一个深度Q学习贪吃蛇游戏,计划加入遗传算法组件。为此,我正在设置循环,使我能够创建给定的蛇群,每条蛇 运行 一定数量的剧集,总计几代人。
应该很简单。只是一些嵌套的 for 循环。只是,我的 for 循环得到了一些非常疯狂的结果。
这是有问题的代码:
def run(population_size=1, max_episodes=10, max_generations=50):
total_score = 0
agents = [Agent() for i in range(population_size)]
game = SnakeGameAI()
for cur_gen in range(max_generations):
game.generation = cur_gen
for agent_num, agent in enumerate(agents):
# Set colors
game.color1 = agent.color1
game.color2 = agent.color2
# Set agent number
game.agent_num = agent_num
for cur_episode in range(1, max_episodes+1):
# Get old state
state_old = agent.get_state(game)
# Get move
final_move = agent.get_action(state_old)
# Perform move and get new state
reward, done, score = game.play_step(final_move)
state_new = agent.get_state(game)
# Train short memory
agent.train_short_memory(state_old, final_move, reward, state_new, done)
# Remember
agent.remember(state_old, final_move, reward, state_new, done)
# Snake died
if done:
# Train long memory, plot result
game.reset()
agent.episode = cur_episode
game.agent_episode = cur_episode
agent.train_long_memory()
if score > game.top_score:
game.top_score = score
agent.model.save()
total_score += score
game.mean_score = np.round((total_score / cur_episode), 3)
print(f"Agent{game.agent_num}")
print(f"Episode: {cur_episode}")
print(f"Generation: {cur_gen}")
print(f"Score: {score}")
print(f"Top Score: {game.top_score}")
print(f"Mean: {game.mean_score}\n")
这是它给出的输出:
Agent0
Episode: 3
Generation: 7
Score: 0
Top Score: 0
Mean: 0.0
Agent0
Episode: 3
Generation: 14
Score: 0
Top Score: 0
Mean: 0.0
Agent0
Episode: 7
Generation: 20
Score: 1
Top Score: 1
Mean: 0.143
Agent0
Episode: 10
Generation: 26
Score: 0
Top Score: 1
Mean: 0.1
Agent0
Episode: 6
Generation: 28
Score: 1
Top Score: 1
Mean: 0.333
Agent0
Episode: 5
Generation: 37
Score: 0
Top Score: 1
Mean: 0.4
Agent0
Episode: 3
Generation: 43
Score: 0
Top Score: 1
Mean: 0.667
Agent0
Episode: 1
Generation: 45
Score: 1
Top Score: 1
Mean: 3.0
Agent0
Episode: 2
Generation: 49
Score: 0
Top Score: 1
Mean: 1.5
世代数每秒稳定增加,直到达到 49 并结束循环,而每一次蛇死亡时剧集数都会随机变化。这很奇怪。我从来没有见过这样的事情,也不知道我的代码中有什么可能导致它。
答案:
致所有不想浏览评论的人
Eli Harold 帮我解决了这个问题,问题是我的代码把每一集都当作游戏的框架来处理。因此,与其说蛇的整个生命周期(整个游戏)是一集,不如说蛇的每一次行动都是一集。
这是我的代码现在的样子。我添加了一个 运行 循环,解决了这个问题。
def run(population_size=1, max_episodes=10, max_generations=50):
total_score = 0
agents = [Agent() for i in range(population_size)]
game = SnakeGameAI()
for cur_gen in range(max_generations):
game.generation = cur_gen
for agent_num, agent in enumerate(agents):
# Set colors
game.color1 = agent.color1
game.color2 = agent.color2
# Set agent number
game.agent_num = agent_num
for cur_episode in range(1, max_episodes+1):
run = True
while run:
# Get old state
state_old = agent.get_state(game)
# Get move
final_move = agent.get_action(state_old)
# Perform move and get new state
reward, done, score = game.play_step(final_move)
state_new = agent.get_state(game)
# Train short memory
agent.train_short_memory(state_old, final_move, reward, state_new, done)
# Remember
agent.remember(state_old, final_move, reward, state_new, done)
# Snake died
if done:
run = False
# Train long memory, plot result
game.reset()
agent.episode = cur_episode
game.agent_episode = cur_episode
agent.train_long_memory()
if score > game.top_score:
game.top_score = score
agent.model.save()
total_score += score
game.mean_score = np.round((total_score / cur_episode), 3)
print(f"Agent{game.agent_num}")
print(f"Episode: {cur_episode}")
print(f"Generation: {cur_gen}")
print(f"Score: {score}")
print(f"Top Score: {game.top_score}")
print(f"Mean: {game.mean_score}\n")
空闲时间一直在做一个深度Q学习贪吃蛇游戏,计划加入遗传算法组件。为此,我正在设置循环,使我能够创建给定的蛇群,每条蛇 运行 一定数量的剧集,总计几代人。
应该很简单。只是一些嵌套的 for 循环。只是,我的 for 循环得到了一些非常疯狂的结果。
这是有问题的代码:
def run(population_size=1, max_episodes=10, max_generations=50):
total_score = 0
agents = [Agent() for i in range(population_size)]
game = SnakeGameAI()
for cur_gen in range(max_generations):
game.generation = cur_gen
for agent_num, agent in enumerate(agents):
# Set colors
game.color1 = agent.color1
game.color2 = agent.color2
# Set agent number
game.agent_num = agent_num
for cur_episode in range(1, max_episodes+1):
# Get old state
state_old = agent.get_state(game)
# Get move
final_move = agent.get_action(state_old)
# Perform move and get new state
reward, done, score = game.play_step(final_move)
state_new = agent.get_state(game)
# Train short memory
agent.train_short_memory(state_old, final_move, reward, state_new, done)
# Remember
agent.remember(state_old, final_move, reward, state_new, done)
# Snake died
if done:
# Train long memory, plot result
game.reset()
agent.episode = cur_episode
game.agent_episode = cur_episode
agent.train_long_memory()
if score > game.top_score:
game.top_score = score
agent.model.save()
total_score += score
game.mean_score = np.round((total_score / cur_episode), 3)
print(f"Agent{game.agent_num}")
print(f"Episode: {cur_episode}")
print(f"Generation: {cur_gen}")
print(f"Score: {score}")
print(f"Top Score: {game.top_score}")
print(f"Mean: {game.mean_score}\n")
这是它给出的输出:
Agent0
Episode: 3
Generation: 7
Score: 0
Top Score: 0
Mean: 0.0
Agent0
Episode: 3
Generation: 14
Score: 0
Top Score: 0
Mean: 0.0
Agent0
Episode: 7
Generation: 20
Score: 1
Top Score: 1
Mean: 0.143
Agent0
Episode: 10
Generation: 26
Score: 0
Top Score: 1
Mean: 0.1
Agent0
Episode: 6
Generation: 28
Score: 1
Top Score: 1
Mean: 0.333
Agent0
Episode: 5
Generation: 37
Score: 0
Top Score: 1
Mean: 0.4
Agent0
Episode: 3
Generation: 43
Score: 0
Top Score: 1
Mean: 0.667
Agent0
Episode: 1
Generation: 45
Score: 1
Top Score: 1
Mean: 3.0
Agent0
Episode: 2
Generation: 49
Score: 0
Top Score: 1
Mean: 1.5
世代数每秒稳定增加,直到达到 49 并结束循环,而每一次蛇死亡时剧集数都会随机变化。这很奇怪。我从来没有见过这样的事情,也不知道我的代码中有什么可能导致它。
答案:
致所有不想浏览评论的人 Eli Harold 帮我解决了这个问题,问题是我的代码把每一集都当作游戏的框架来处理。因此,与其说蛇的整个生命周期(整个游戏)是一集,不如说蛇的每一次行动都是一集。
这是我的代码现在的样子。我添加了一个 运行 循环,解决了这个问题。
def run(population_size=1, max_episodes=10, max_generations=50):
total_score = 0
agents = [Agent() for i in range(population_size)]
game = SnakeGameAI()
for cur_gen in range(max_generations):
game.generation = cur_gen
for agent_num, agent in enumerate(agents):
# Set colors
game.color1 = agent.color1
game.color2 = agent.color2
# Set agent number
game.agent_num = agent_num
for cur_episode in range(1, max_episodes+1):
run = True
while run:
# Get old state
state_old = agent.get_state(game)
# Get move
final_move = agent.get_action(state_old)
# Perform move and get new state
reward, done, score = game.play_step(final_move)
state_new = agent.get_state(game)
# Train short memory
agent.train_short_memory(state_old, final_move, reward, state_new, done)
# Remember
agent.remember(state_old, final_move, reward, state_new, done)
# Snake died
if done:
run = False
# Train long memory, plot result
game.reset()
agent.episode = cur_episode
game.agent_episode = cur_episode
agent.train_long_memory()
if score > game.top_score:
game.top_score = score
agent.model.save()
total_score += score
game.mean_score = np.round((total_score / cur_episode), 3)
print(f"Agent{game.agent_num}")
print(f"Episode: {cur_episode}")
print(f"Generation: {cur_gen}")
print(f"Score: {score}")
print(f"Top Score: {game.top_score}")
print(f"Mean: {game.mean_score}\n")