用 Numpy 中的计数值替换重复值（矢量化）

Question

我有一个重复值数组，用于将数据点与某个 ID 匹配。如何用矢量化方式计算索引值来替换 ID？

考虑以下最小示例：

import numpy as np

n_samples = 10

ids = np.random.randint(0,500, n_samples)
lengths = np.random.randint(1,5, n_samples)

x = np.repeat(ids, lengths)
print(x)

输出：

[129 129 129 129 173 173 173 207 207   5 430 147 143 256 256 256 256 230 230  68]

所需的解决方案：

indices = np.arange(n_samples)
y = np.repeat(indices, lengths)
print(y)

输出：

[0 0 0 0 1 1 1 2 2 3 4 5 6 7 7 7 7 8 8 9]

然而，在实际代码中，我无法访问 ids 和 lengths 等变量，而只能访问 x.

x 中的值是什么并不重要，我只想要一个数组，其中包含与 x.

中重复相同数量的整数

我可以使用 for 循环或 np.unique 提出解决方案，但两者对于我的用例来说都太慢了。

有没有人想出一个快速算法的想法，它采用像 x 和 returns 这样的数组，像 y 这样的数组？

Answer 1

你可以这样做：

y = np.r_[False, x[1:] != x[:-1]].cumsum()

或者少一个临时数组：

y = np.empty(len(x), int)
y[0] = 0
np.cumsum(x[1:] != x[:-1], out=y[1:])
print(y)

用 Numpy 中的计数值替换重复值（矢量化）

replace repeated values with counting up values in Numpy (vectorized)

python

arrays

numpy