随机更改 for 循环值

Randomly changing for loop values

空闲时间一直在做一个深度Q学习贪吃蛇游戏,计划加入遗传算法组件。为此,我正在设置循环,使我能够创建给定的蛇群,每条蛇 运行 一定数量的剧集,总计几代人。

应该很简单。只是一些嵌套的 for 循环。只是,我的 for 循环得到了一些非常疯狂的结果。

这是有问题的代码:

def run(population_size=1, max_episodes=10, max_generations=50):
    total_score = 0

    agents = [Agent() for i in range(population_size)]
    game = SnakeGameAI()

    for cur_gen in range(max_generations):
        game.generation = cur_gen
        for agent_num, agent in enumerate(agents):
            # Set colors
            game.color1 = agent.color1
            game.color2 = agent.color2

            # Set agent number
            game.agent_num = agent_num

            for cur_episode in range(1, max_episodes+1):
                # Get old state
                state_old = agent.get_state(game)

                # Get move
                final_move = agent.get_action(state_old)

                # Perform move and get new state
                reward, done, score = game.play_step(final_move)
                state_new = agent.get_state(game)

                # Train short memory
                agent.train_short_memory(state_old, final_move, reward, state_new, done)

                # Remember
                agent.remember(state_old, final_move, reward, state_new, done)

                # Snake died
                if done:
                    # Train long memory, plot result
                    game.reset()
                    agent.episode = cur_episode
                    game.agent_episode = cur_episode
                    agent.train_long_memory()

                    if score > game.top_score:
                        game.top_score = score
                        agent.model.save()

                    total_score += score
                    game.mean_score = np.round((total_score / cur_episode), 3)
                    
                    print(f"Agent{game.agent_num}")
                    print(f"Episode: {cur_episode}")
                    print(f"Generation: {cur_gen}")
                    print(f"Score: {score}")
                    print(f"Top Score: {game.top_score}")
                    print(f"Mean: {game.mean_score}\n")

这是它给出的输出:

Agent0
Episode: 3
Generation: 7
Score: 0
Top Score: 0
Mean: 0.0

Agent0
Episode: 3
Generation: 14
Score: 0
Top Score: 0
Mean: 0.0

Agent0
Episode: 7
Generation: 20
Score: 1
Top Score: 1
Mean: 0.143

Agent0
Episode: 10
Generation: 26
Score: 0
Top Score: 1
Mean: 0.1

Agent0
Episode: 6
Generation: 28
Score: 1
Top Score: 1
Mean: 0.333

Agent0
Episode: 5
Generation: 37
Score: 0
Top Score: 1
Mean: 0.4

Agent0
Episode: 3
Generation: 43
Score: 0
Top Score: 1
Mean: 0.667

Agent0
Episode: 1
Generation: 45
Score: 1
Top Score: 1
Mean: 3.0

Agent0
Episode: 2
Generation: 49
Score: 0
Top Score: 1
Mean: 1.5

世代数每秒稳定增加,直到达到 49 并结束循环,而每一次蛇死亡时剧集数都会随机变化。这很奇怪。我从来没有见过这样的事情,也不知道我的代码中有什么可能导致它。

答案:

致所有不想浏览评论的人 Eli Harold 帮我解决了这个问题,问题是我的代码把每一集都当作游戏的框架来处理。因此,与其说蛇的整个生命周期(整个游戏)是一集,不如说蛇的每一次行动都是一集。

这是我的代码现在的样子。我添加了一个 运行 循环,解决了这个问题。

def run(population_size=1, max_episodes=10, max_generations=50):
    total_score = 0

    agents = [Agent() for i in range(population_size)]
    game = SnakeGameAI()

    for cur_gen in range(max_generations):
        game.generation = cur_gen
        for agent_num, agent in enumerate(agents):
            # Set colors
            game.color1 = agent.color1
            game.color2 = agent.color2

            # Set agent number
            game.agent_num = agent_num

            for cur_episode in range(1, max_episodes+1):
                run = True
                while run:
                    # Get old state
                    state_old = agent.get_state(game)

                    # Get move
                    final_move = agent.get_action(state_old)

                    # Perform move and get new state
                    reward, done, score = game.play_step(final_move)
                    state_new = agent.get_state(game)

                    # Train short memory
                    agent.train_short_memory(state_old, final_move, reward, state_new, done)

                    # Remember
                    agent.remember(state_old, final_move, reward, state_new, done)

                    # Snake died
                    if done:
                        run = False
                        # Train long memory, plot result
                        game.reset()
                        agent.episode = cur_episode
                        game.agent_episode = cur_episode
                        agent.train_long_memory()

                        if score > game.top_score:
                            game.top_score = score
                            agent.model.save()

                        total_score += score
                        game.mean_score = np.round((total_score / cur_episode), 3)
                        
                        print(f"Agent{game.agent_num}")
                        print(f"Episode: {cur_episode}")
                        print(f"Generation: {cur_gen}")
                        print(f"Score: {score}")
                        print(f"Top Score: {game.top_score}")
                        print(f"Mean: {game.mean_score}\n")