从两个数组创建直方图

Create histogram from two arrays

我有两个具有相同维度的 numpy 数组:权重和百分比。百分比是 'real' 数据,权重是直方图中每个 'real' 数据的数量。

例)

weights = [[0, 1, 1, 4, 2]
           [0, 1, 0, 3, 5]]
percents = [[1, 2, 3, 4, 5]
            [1, 2, 3, 4, 5]]

(每行百分比都一样)

我想将这些“相乘”在一起,从而产生权重[x] * [百分比[x]]:

results = [[0 * [1] + 1 * [2] + 1 * [3] + 4 * [4] + 2 * [5]
           [0 * [1] + 1 * [2] + 0 * [3] + 3 * [4] + 5 * [5]]
        = [[2, 3, 4, 4, 4, 4, 5, 5]
           [2, 4, 4, 4, 5, 5, 5, 5, 5]]

请注意,每行的长度可以不同。理想情况下,这可以在 numpy 中完成,但正因为如此,它最终可能会成为列表的列表。

编辑: 我已经能够将这些嵌套的 for 循环拼凑在一起,但显然它并不理想:

list_of_hists = []
for index in df.index:
    hist = []
    # Create a list of lists, later to be flattened to 'results'
    for i, percent in enumerate(percents):
        hist.append(
        # For each percent, create a list of [percent] * weight
            [percent]
            * int(
                df.iloc[index].values[i]
            )
        )
    # flatten the list of lists in hist
    results = [val for list_ in hist for val in list_]
    list_of_hists.append(results)

您可以使用列表理解和 reduce 来自 functools:

import functools
res=[functools.reduce(lambda x,y: x+y,
                [x*[y] for x, y in zip(w, p)])
                for w, p in zip(weights, percents)]

输出:

[[2, 3, 4, 4, 4, 4, 5, 5],
 [2, 4, 4, 4, 5, 5, 5, 5, 5]]

或者,仅列表理解解决方案:

res= [[j for i in [x*[y]
              for x, y in zip(w, p)]
                for j in i]
    for w, p in zip(weights, percents)]

输出:

[[2, 3, 4, 4, 4, 4, 5, 5],
 [2, 4, 4, 4, 5, 5, 5, 5, 5]]

有一个 np.repeat 专为此类操作而设计,但它不适用于 2D 情况。因此,您需要使用数组的扁平化视图。

weights = np.array([[0, 1, 1, 4, 2], [0, 1, 0, 3, 5]])
percents = np.array([[1, 2, 3, 4, 5], [1, 2, 3, 4, 5]])
>>> np.repeat(percents.ravel(), weights.ravel())
array([2, 3, 4, 4, 4, 4, 5, 5, 2, 4, 4, 4, 5, 5, 5, 5, 5])

然后你需要select索引分割它的位置:

>>> np.split(np.repeat(percents.ravel(), weights.ravel()), np.cumsum(np.sum(weights, axis=1)[:-1]))
[array([2, 3, 4, 4, 4, 4, 5, 5]), array([2, 4, 4, 4, 5, 5, 5, 5, 5])]

请注意,np.split 是非常低效的操作,而且您希望从不等长的行中创建数组。