高效统计 NumPy 中唯一子数组的出现次数?
Efficiently count the number of occurrences of unique subarrays in NumPy?
我有一个形状为 (128, 36, 8)
的数组,我想找出最后一个维度中长度为 8 的唯一子数组的出现次数。
我知道 np.unique
和 np.bincount
,但它们似乎是针对元素而不是子数组。我看过 this question 但它是关于查找特定子数组的第一次出现,而不是所有唯一子数组的计数。
我不确定这是最有效的方法,但这应该可行。
arr = arr.reshape(128*36,8)
unique_ = []
occurence_ = []
for sub in arr:
if sub.tolist() not in unique_:
unique_.append(sub.tolist())
occurence_.append(1)
else:
occurence_[unique_.index(sub.tolist())]+=1
for index_,u in unique_:
print u,"occurrence: %s"%occurence_[index_]
问题指出输入数组的形状为 (128, 36, 8)
,我们有兴趣在最后一个维度中找到长度为 8
的唯一子数组。
所以,我假设唯一性是沿着前两个维度合并在一起的。让我们假设 A
作为输入 3D 数组。
获取唯一子数组的个数
# Reshape the 3D array to a 2D array merging the first two dimensions
Ar = A.reshape(-1,A.shape[2])
# Perform lex sort and get the sorted indices and xy pairs
sorted_idx = np.lexsort(Ar.T)
sorted_Ar = Ar[sorted_idx,:]
# Get the count of rows that have at least one TRUE value
# indicating presence of unique subarray there
unq_out = np.any(np.diff(sorted_Ar,axis=0),1).sum()+1
样本运行-
In [159]: A # A is (2,2,3)
Out[159]:
array([[[0, 0, 0],
[0, 0, 2]],
[[0, 0, 2],
[2, 0, 1]]])
In [160]: unq_out
Out[160]: 3
获取唯一子数组的出现次数
# Reshape the 3D array to a 2D array merging the first two dimensions
Ar = A.reshape(-1,A.shape[2])
# Perform lex sort and get the sorted indices and xy pairs
sorted_idx = np.lexsort(Ar.T)
sorted_Ar = Ar[sorted_idx,:]
# Get IDs for each element based on their uniqueness
id = np.append([0],np.any(np.diff(sorted_Ar,axis=0),1).cumsum())
# Get counts for each ID as the final output
unq_count = np.bincount(id)
样本运行-
In [64]: A
Out[64]:
array([[[0, 0, 2],
[1, 1, 1]],
[[1, 1, 1],
[1, 2, 0]]])
In [65]: unq_count
Out[65]: array([1, 2, 1], dtype=int64)
这里我修改了@Divakar 对 return 唯一子数组的计数以及子数组本身的非常有用的答案,以便输出与 collections.Counter.most_common()
的输出相同:
# Get the array in 2D form.
arr = arr.reshape(-1, arr.shape[-1])
# Lexicographically sort
sorted_arr = arr[np.lexsort(arr.T), :]
# Get the indices where a new row appears
diff_idx = np.where(np.any(np.diff(sorted_arr, axis=0), 1))[0]
# Get the unique rows
unique_rows = [sorted_arr[i] for i in diff_idx] + [sorted_arr[-1]]
# Get the number of occurences of each unique array (the -1 is needed at
# the beginning, rather than 0, because of fencepost concerns)
counts = np.diff(
np.append(np.insert(diff_idx, 0, -1), sorted_arr.shape[0] - 1))
# Return the (row, count) pairs sorted by count
return sorted(zip(unique_rows, counts), key=lambda x: x[1], reverse=True)
我有一个形状为 (128, 36, 8)
的数组,我想找出最后一个维度中长度为 8 的唯一子数组的出现次数。
我知道 np.unique
和 np.bincount
,但它们似乎是针对元素而不是子数组。我看过 this question 但它是关于查找特定子数组的第一次出现,而不是所有唯一子数组的计数。
我不确定这是最有效的方法,但这应该可行。
arr = arr.reshape(128*36,8)
unique_ = []
occurence_ = []
for sub in arr:
if sub.tolist() not in unique_:
unique_.append(sub.tolist())
occurence_.append(1)
else:
occurence_[unique_.index(sub.tolist())]+=1
for index_,u in unique_:
print u,"occurrence: %s"%occurence_[index_]
问题指出输入数组的形状为 (128, 36, 8)
,我们有兴趣在最后一个维度中找到长度为 8
的唯一子数组。
所以,我假设唯一性是沿着前两个维度合并在一起的。让我们假设 A
作为输入 3D 数组。
获取唯一子数组的个数
# Reshape the 3D array to a 2D array merging the first two dimensions
Ar = A.reshape(-1,A.shape[2])
# Perform lex sort and get the sorted indices and xy pairs
sorted_idx = np.lexsort(Ar.T)
sorted_Ar = Ar[sorted_idx,:]
# Get the count of rows that have at least one TRUE value
# indicating presence of unique subarray there
unq_out = np.any(np.diff(sorted_Ar,axis=0),1).sum()+1
样本运行-
In [159]: A # A is (2,2,3)
Out[159]:
array([[[0, 0, 0],
[0, 0, 2]],
[[0, 0, 2],
[2, 0, 1]]])
In [160]: unq_out
Out[160]: 3
获取唯一子数组的出现次数
# Reshape the 3D array to a 2D array merging the first two dimensions
Ar = A.reshape(-1,A.shape[2])
# Perform lex sort and get the sorted indices and xy pairs
sorted_idx = np.lexsort(Ar.T)
sorted_Ar = Ar[sorted_idx,:]
# Get IDs for each element based on their uniqueness
id = np.append([0],np.any(np.diff(sorted_Ar,axis=0),1).cumsum())
# Get counts for each ID as the final output
unq_count = np.bincount(id)
样本运行-
In [64]: A
Out[64]:
array([[[0, 0, 2],
[1, 1, 1]],
[[1, 1, 1],
[1, 2, 0]]])
In [65]: unq_count
Out[65]: array([1, 2, 1], dtype=int64)
这里我修改了@Divakar 对 return 唯一子数组的计数以及子数组本身的非常有用的答案,以便输出与 collections.Counter.most_common()
的输出相同:
# Get the array in 2D form.
arr = arr.reshape(-1, arr.shape[-1])
# Lexicographically sort
sorted_arr = arr[np.lexsort(arr.T), :]
# Get the indices where a new row appears
diff_idx = np.where(np.any(np.diff(sorted_arr, axis=0), 1))[0]
# Get the unique rows
unique_rows = [sorted_arr[i] for i in diff_idx] + [sorted_arr[-1]]
# Get the number of occurences of each unique array (the -1 is needed at
# the beginning, rather than 0, because of fencepost concerns)
counts = np.diff(
np.append(np.insert(diff_idx, 0, -1), sorted_arr.shape[0] - 1))
# Return the (row, count) pairs sorted by count
return sorted(zip(unique_rows, counts), key=lambda x: x[1], reverse=True)