对一维整数数组进行子采样，使总和达到 python 中的目标值

Question

我有两个一维整数数组，它们有些不同，例如：

a = [1,2,2,0,3,5]
b = [0,0,3,2,0,0]

我希望每个数组的总和等于两个数组中最小的那个的总和。但是我想将值保留为整数，而不是浮点数，所以除法不是一种选择。解决方案似乎是对最大数组进行一些子采样，使其总和等于最小数组的总和：

target = [min(sum(a), sum(b))]

但是，我找不到可以执行此类子采样的函数。我发现的唯一一个在 scipy 中，但它们似乎专门用于处理音频信号。替代方案是 scikit-bio package 的函数，但它不适用于 Python 3.7.

Answer 1

您可以将数组转换为索引，对索引进行采样并转换回值，如下所示：

import numpy as np
np.random.seed(0)
a = np.array([1,2,2,0,3,5])

# Generate an array of indices, values in "a"
# define the number of occurences of their index
a_idx = np.array([i for i in range(len(a))])
a_idx = np.repeat(np.arange(len(a)), a)
# [0, 1, 1, 2, 2, 4, 4, 4, 5, 5, 5, 5, 5]

# Randomly shuffle indices and pick the n-first
a_sub_idx = np.random.permutation(a_idx)[:target]
# [4, 1, 2, 2, 5]

# Count the number of occurences of each index
a_sub_idx, a_sub_vals = np.unique(a_sub_idx, return_counts=True)
# Generate a new array of values the sampled indices
a_sub = np.zeros(a.shape)
a_sub[a_sub_idx] = a_sub_vals
# [0., 1., 2., 0., 1., 1.]

对一维整数数组进行子采样，使总和达到 python 中的目标值

Subsampling a 1D array of integer so that the sum hits a target value in python

python

arrays

subsampling