我应该将值存储在字典中还是即时计算？

Question

我有一个问题，我有称为状态和动作的元组，我想计算它 "binary features"。下面描述了计算状态和动作特征的函数。请注意，这只是一个玩具代码。

我有大约 700,000 种状态和动作的组合。我还需要 numpy array/scipy 稀疏矩阵中的特征。

现在的问题是，我必须计算状态和动作的特征数百万次。我有两个选择。

一种选择是预先使用函数将所有700,000个组合下面的计算并存储在字典中。键是（状态，动作），值是二进制特征。

另一种选择是每次我想找到每个状态和动作的二进制特征的值时调用下面的函数。

我的 objective 是为了获得良好的性能并提高内存效率。

from numpy import array
from scipy import sparse

def compute_features(state, action):
    # state and action are 3-tuples of integers. 
    # e.g. (1, 2, 3)
    return array(state) - array(action)

def computer_binary_features(state, action, coord_size):
    # e.g. 
    # features = (1, 0, 2)
    # sizes = (2, 2, 3)
    # Meaning, the size of first coordinate is 2, second is 2 and third is 3.
    # It means the first coordinate can only take value integers 0 to 7.
    #
    # What this function does is turning (1, 0, 2) into binary features.
    # For the first coordinate, the value is 1 and the size is 2, so the binary
    # features of the first coordinate it (0, 1).
    # Second coordinate, the value is 0 and the size is 2. The binary features 
    # is (1, 0)
    # Third coordinate, the value is 2 and the size is 3. The binary features is
    # (0, 0, 1).
    # 
    # So the binary features of (1, 0, 2) is: (0, 1, 1, 0, 0, 0, 1)
    #
    # This function does not do concatenation but rather finding position of ones
    # in the binary features of size sum(sizes).
    # returns a coo sparse 0-1 valued 1 x n matrix.
    features = compute_features(state, action)
    coord_size = array(coord_size)
    col = []
    index = 0
    for i in range(len(features)):
        index = index + coord_size[i]
        col.append(index + features[i] - min_elem[i])
    row = [0] * len(col)
    data = [1] * len(col)
    mtx = sparse.coo_matrix((data, (row, col)), (1, sum(coord_size)),
                            dtype=np.uint8)
    return mtx

Answer 1

如果尽快 return 结果非常重要，那么您应该考虑选项一。但是，您应该记住内存和设置时间开销，这可能太昂贵了。

如果性能根本不是问题，您应该更喜欢选项二。这将使您的代码更简单，并且不会不必要地增加内存消耗和设置时间。

如果性能确实发挥了一些作用，但不必每次都尽可能好，我建议将这两个选项结合起来。要享受两个世界，您可以使用 Memoization。基本上，这意味着您按需计算结果，但就在 returning 它们之前，您按照您的建议将它们放入字典中。该函数将尝试在字典中查找结果，并仅在必要时计算结果。 Here 是一个很好的教程，用于在 python 中实现记忆。

我应该将值存储在字典中还是即时计算？

Should I store the values in dictionary or compute on-the-fly?

python

dictionary

numpy