错误计算递归状态概率

Miscalculating recursive state probability

def get_dir(state, max_depth):
    #Possible directions and their corresponding scores so far
    paths = {'w':1,'a':1,'s':1,'d':1}
    #Rate each direction
    for dir in paths:
        #Takes a game state and a direction as input and returns
        #a list of all possible states that could occur from moving in that direction
        children = successors(state, dir)
        if children:
            children = [children[0][:10], children[1][:10]]
            #Weight the probability of the each state depending on if a 2 on or a 4 was spawned
            weights = {0:.9,1:.1}
            for section in weights:
                for board in children[section]:
                                                                 #PROBLEM HERE
                    paths[dir] += rank_branch(board, max_depth,  (weights[section]*(1/(num_empty(board)))))
        else:
            paths[dir] = False

我正在使用上述函数来选择 2048 年的移动方向。我正在尝试通过我们能够到达该状态的概率来加权每个状态的启发式排名。

为了做到这一点,在每一层,我将生成带有该数字的图块的概率(t 为 .9,4 为 .1)乘以它可能生成的位置数(数字空瓷砖)。

我的代码:

weights[section]*(1/(num_empty(board))))

当我打印出概率变量时,它总是很高。它总是认为我们能够达到给定状态的几率比实际情况要大?

如果 children 变量中的每个板都是生成随机图块后的状态,那么您是否需要将空图块的数量加 1,因为新图块生成的位置在它生成之前是空的在那里?

weights[section]*(1/(num_empty(board)+1)))

也就是说,每次调用一个函数似乎有点愚蠢,因为在给定方向移动时获得状态的概率对于所有后继者都是相同的(除了产生 2 个瓷砖与 4 个瓷砖时的差异瓷砖)。

计算概率的更好方法是仅计算继任者并计算出从该池中被选中的几率。

prob = {0:9/(len(children[0])*9)+len(children[1]),1:1/(len(children[0])*9)+len(children[1])}