从两个数组创建直方图
Create histogram from two arrays
我有两个具有相同维度的 numpy 数组:权重和百分比。百分比是 'real' 数据,权重是直方图中每个 'real' 数据的数量。
例)
weights = [[0, 1, 1, 4, 2]
[0, 1, 0, 3, 5]]
percents = [[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]]
(每行百分比都一样)
我想将这些“相乘”在一起,从而产生权重[x] * [百分比[x]]:
results = [[0 * [1] + 1 * [2] + 1 * [3] + 4 * [4] + 2 * [5]
[0 * [1] + 1 * [2] + 0 * [3] + 3 * [4] + 5 * [5]]
= [[2, 3, 4, 4, 4, 4, 5, 5]
[2, 4, 4, 4, 5, 5, 5, 5, 5]]
请注意,每行的长度可以不同。理想情况下,这可以在 numpy 中完成,但正因为如此,它最终可能会成为列表的列表。
编辑:
我已经能够将这些嵌套的 for 循环拼凑在一起,但显然它并不理想:
list_of_hists = []
for index in df.index:
hist = []
# Create a list of lists, later to be flattened to 'results'
for i, percent in enumerate(percents):
hist.append(
# For each percent, create a list of [percent] * weight
[percent]
* int(
df.iloc[index].values[i]
)
)
# flatten the list of lists in hist
results = [val for list_ in hist for val in list_]
list_of_hists.append(results)
您可以使用列表理解和 reduce
来自 functools
:
import functools
res=[functools.reduce(lambda x,y: x+y,
[x*[y] for x, y in zip(w, p)])
for w, p in zip(weights, percents)]
输出:
[[2, 3, 4, 4, 4, 4, 5, 5],
[2, 4, 4, 4, 5, 5, 5, 5, 5]]
或者,仅列表理解解决方案:
res= [[j for i in [x*[y]
for x, y in zip(w, p)]
for j in i]
for w, p in zip(weights, percents)]
输出:
[[2, 3, 4, 4, 4, 4, 5, 5],
[2, 4, 4, 4, 5, 5, 5, 5, 5]]
有一个 np.repeat
专为此类操作而设计,但它不适用于 2D 情况。因此,您需要使用数组的扁平化视图。
weights = np.array([[0, 1, 1, 4, 2], [0, 1, 0, 3, 5]])
percents = np.array([[1, 2, 3, 4, 5], [1, 2, 3, 4, 5]])
>>> np.repeat(percents.ravel(), weights.ravel())
array([2, 3, 4, 4, 4, 4, 5, 5, 2, 4, 4, 4, 5, 5, 5, 5, 5])
然后你需要select索引分割它的位置:
>>> np.split(np.repeat(percents.ravel(), weights.ravel()), np.cumsum(np.sum(weights, axis=1)[:-1]))
[array([2, 3, 4, 4, 4, 4, 5, 5]), array([2, 4, 4, 4, 5, 5, 5, 5, 5])]
请注意,np.split
是非常低效的操作,而且您希望从不等长的行中创建数组。
我有两个具有相同维度的 numpy 数组:权重和百分比。百分比是 'real' 数据,权重是直方图中每个 'real' 数据的数量。
例)
weights = [[0, 1, 1, 4, 2]
[0, 1, 0, 3, 5]]
percents = [[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]]
(每行百分比都一样)
我想将这些“相乘”在一起,从而产生权重[x] * [百分比[x]]:
results = [[0 * [1] + 1 * [2] + 1 * [3] + 4 * [4] + 2 * [5]
[0 * [1] + 1 * [2] + 0 * [3] + 3 * [4] + 5 * [5]]
= [[2, 3, 4, 4, 4, 4, 5, 5]
[2, 4, 4, 4, 5, 5, 5, 5, 5]]
请注意,每行的长度可以不同。理想情况下,这可以在 numpy 中完成,但正因为如此,它最终可能会成为列表的列表。
编辑: 我已经能够将这些嵌套的 for 循环拼凑在一起,但显然它并不理想:
list_of_hists = []
for index in df.index:
hist = []
# Create a list of lists, later to be flattened to 'results'
for i, percent in enumerate(percents):
hist.append(
# For each percent, create a list of [percent] * weight
[percent]
* int(
df.iloc[index].values[i]
)
)
# flatten the list of lists in hist
results = [val for list_ in hist for val in list_]
list_of_hists.append(results)
您可以使用列表理解和 reduce
来自 functools
:
import functools
res=[functools.reduce(lambda x,y: x+y,
[x*[y] for x, y in zip(w, p)])
for w, p in zip(weights, percents)]
输出:
[[2, 3, 4, 4, 4, 4, 5, 5],
[2, 4, 4, 4, 5, 5, 5, 5, 5]]
或者,仅列表理解解决方案:
res= [[j for i in [x*[y]
for x, y in zip(w, p)]
for j in i]
for w, p in zip(weights, percents)]
输出:
[[2, 3, 4, 4, 4, 4, 5, 5],
[2, 4, 4, 4, 5, 5, 5, 5, 5]]
有一个 np.repeat
专为此类操作而设计,但它不适用于 2D 情况。因此,您需要使用数组的扁平化视图。
weights = np.array([[0, 1, 1, 4, 2], [0, 1, 0, 3, 5]])
percents = np.array([[1, 2, 3, 4, 5], [1, 2, 3, 4, 5]])
>>> np.repeat(percents.ravel(), weights.ravel())
array([2, 3, 4, 4, 4, 4, 5, 5, 2, 4, 4, 4, 5, 5, 5, 5, 5])
然后你需要select索引分割它的位置:
>>> np.split(np.repeat(percents.ravel(), weights.ravel()), np.cumsum(np.sum(weights, axis=1)[:-1]))
[array([2, 3, 4, 4, 4, 4, 5, 5]), array([2, 4, 4, 4, 5, 5, 5, 5, 5])]
请注意,np.split
是非常低效的操作,而且您希望从不等长的行中创建数组。