dtype=int 的 2D Numpy 数组:最大化相同整数之间的距离
2D Numpy array of dtype=int: maximize distance between identical integers
对双重 posting 表示歉意。我还没有找到解决方案。
前post:
问题
我有一个随机打乱的 dtype=int
二维数组,我想最大化相同整数之间的距离。这是一个优化问题。
例如我有:
np.array([1, 2, 2, 2],
[3, 2, 1, 1],
[1, 3, 3, 1],
[1, 5, 5, 1])
一些整数在0轴和1轴方向上彼此太靠近了。例如。两者 5
。是否有启发式或 scipy 优化方法可以在某种程度上解决此问题?
在此先感谢您和亲切的问候
我利用 scipy.ndimage.distance_transform_edt
和随机抽样发明了自己的小方法。它远非最佳,但它可以很好地完成工作,具体取决于您的目标:
import numpy as np
from scipy.ndimage import distance_transform_edt
def maximize_arr_replicate_distances(input_arr: np.array) -> np.array:
# Assert the input array
assert isinstance(input_arr, np.ndarray), "'input_arr' is not a Numpy array."
assert input_arr.ndim == 2, f"'input_arr' is not an 'm x n' matrix."
assert np.issubdtype(input_arr.dtype, np.integer), "'input_arr' is not of dtype class 'integer'."
# Bin: Bin all unique elements into subarrays
unique_elements = np.unique(input_arr)
binned_uniques = [input_arr[input_arr == i].tolist() for i in unique_elements]
# Determine a sampling order: randomly sample from subarrays without replacement and append to new list
sampling_order = []
while(any(binned_uniques)):
random_index = np.random.randint(unique_elements.size)
if binned_uniques[random_index]:
sampling_order.append(binned_uniques[random_index].pop())
# Distribute elements: Append elements to empty array based on sampling order and maximum possible distance between replicates
output_arr = np.full(input_arr.shape, np.nan)
for i in sampling_order.copy():
if np.any(output_arr == i):
distance_transform = distance_transform_edt(output_arr != i)
mask = np.in1d(output_arr, unique_elements[unique_elements != i]).reshape(input_arr.shape)
distance_transform_masked = np.ma.masked_array(distance_transform, mask=mask)
distance_transform_max = np.argwhere(distance_transform_masked == np.amax(distance_transform_masked))
rand_max_y, rand_max_x = distance_transform_max[np.random.randint(len(distance_transform_max))]
output_arr[rand_max_y, rand_max_x] = i
else:
nan_indices = np.argwhere(np.isnan(output_arr))
rand_nan_y, rand_nan_x = nan_indices[np.random.randint(len(nan_indices))]
output_arr[rand_nan_y, rand_nan_x] = i
return output_arr.astype(int)
input_arr = np.array([[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
print(maximize_arr_replicate_distances(input_arr))
# [[7 0 0 0 1 0 0 0 1 0 0 7]
# [0 0 0 0 0 0 0 0 0 0 0 1]
# [1 0 0 0 0 0 7 0 0 0 0 0]
# [0 0 0 0 0 0 0 1 0 0 0 0]
# [0 0 0 7 0 0 0 0 0 0 1 0]
# [7 0 0 0 0 0 0 0 7 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 1 0 7 0 0 0 0 1 0 7]]
它适用于各种输入。
请注意,该算法偏向于数组的边界。例如。如果重复 10,000 次,1
将具有以下频率分布 [%]:
from typing import Callable
def evaluate(input_arr: np.array, function: Callable[[np.array], np.array], sample_index: int, iterations: int = 10_000) -> np.array:
sum_arr = np.zeros((input_arr.shape))
for i in range(iterations):
sum_arr += np.where(function(input_arr) == sample_index, 1, 0)
hit_frequencies = sum_arr / sum_arr.sum() * 100 # [%]
return np.around(hit_frequencies, 2)
maximize_replicate_distances = evaluate(input_arr, maximize_arr_replicate_distances, 1)
print(maximize_replicate_distances)
# Frequency distribution of sample replicate '1' after distance-maximizing shuffling (n = 10,000) [%]:
# [[4.06 2.1 0.93 1.25 1.34 2.02 2.1 1.38 1.21 0.93 2.15 4.04]
# [2.48 0.26 0.15 0.3 0.41 0.85 0.82 0.42 0.28 0.13 0.27 2.42]
# [1.07 0.14 0.13 0.46 0.5 0.58 0.63 0.55 0.44 0.12 0.14 1.08]
# [1.55 0.4 0.9 1.49 0.82 0.7 0.77 0.8 1.48 0.84 0.44 1.61]
# [1.67 0.44 0.92 1.48 0.81 0.74 0.76 0.81 1.42 0.88 0.43 1.72]
# [1.04 0.14 0.12 0.4 0.55 0.59 0.6 0.52 0.47 0.14 0.16 1.09]
# [2.44 0.24 0.15 0.26 0.42 0.88 0.81 0.41 0.3 0.15 0.26 2.33]
# [3.99 2.11 0.97 1.33 1.39 1.98 2.01 1.42 1.3 0.97 2.12 3.96]]
对双重 posting 表示歉意。我还没有找到解决方案。
前post:
问题
我有一个随机打乱的 dtype=int
二维数组,我想最大化相同整数之间的距离。这是一个优化问题。
例如我有:
np.array([1, 2, 2, 2],
[3, 2, 1, 1],
[1, 3, 3, 1],
[1, 5, 5, 1])
一些整数在0轴和1轴方向上彼此太靠近了。例如。两者 5
。是否有启发式或 scipy 优化方法可以在某种程度上解决此问题?
在此先感谢您和亲切的问候
我利用 scipy.ndimage.distance_transform_edt
和随机抽样发明了自己的小方法。它远非最佳,但它可以很好地完成工作,具体取决于您的目标:
import numpy as np
from scipy.ndimage import distance_transform_edt
def maximize_arr_replicate_distances(input_arr: np.array) -> np.array:
# Assert the input array
assert isinstance(input_arr, np.ndarray), "'input_arr' is not a Numpy array."
assert input_arr.ndim == 2, f"'input_arr' is not an 'm x n' matrix."
assert np.issubdtype(input_arr.dtype, np.integer), "'input_arr' is not of dtype class 'integer'."
# Bin: Bin all unique elements into subarrays
unique_elements = np.unique(input_arr)
binned_uniques = [input_arr[input_arr == i].tolist() for i in unique_elements]
# Determine a sampling order: randomly sample from subarrays without replacement and append to new list
sampling_order = []
while(any(binned_uniques)):
random_index = np.random.randint(unique_elements.size)
if binned_uniques[random_index]:
sampling_order.append(binned_uniques[random_index].pop())
# Distribute elements: Append elements to empty array based on sampling order and maximum possible distance between replicates
output_arr = np.full(input_arr.shape, np.nan)
for i in sampling_order.copy():
if np.any(output_arr == i):
distance_transform = distance_transform_edt(output_arr != i)
mask = np.in1d(output_arr, unique_elements[unique_elements != i]).reshape(input_arr.shape)
distance_transform_masked = np.ma.masked_array(distance_transform, mask=mask)
distance_transform_max = np.argwhere(distance_transform_masked == np.amax(distance_transform_masked))
rand_max_y, rand_max_x = distance_transform_max[np.random.randint(len(distance_transform_max))]
output_arr[rand_max_y, rand_max_x] = i
else:
nan_indices = np.argwhere(np.isnan(output_arr))
rand_nan_y, rand_nan_x = nan_indices[np.random.randint(len(nan_indices))]
output_arr[rand_nan_y, rand_nan_x] = i
return output_arr.astype(int)
input_arr = np.array([[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
print(maximize_arr_replicate_distances(input_arr))
# [[7 0 0 0 1 0 0 0 1 0 0 7]
# [0 0 0 0 0 0 0 0 0 0 0 1]
# [1 0 0 0 0 0 7 0 0 0 0 0]
# [0 0 0 0 0 0 0 1 0 0 0 0]
# [0 0 0 7 0 0 0 0 0 0 1 0]
# [7 0 0 0 0 0 0 0 7 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 1 0 7 0 0 0 0 1 0 7]]
它适用于各种输入。
请注意,该算法偏向于数组的边界。例如。如果重复 10,000 次,1
将具有以下频率分布 [%]:
from typing import Callable
def evaluate(input_arr: np.array, function: Callable[[np.array], np.array], sample_index: int, iterations: int = 10_000) -> np.array:
sum_arr = np.zeros((input_arr.shape))
for i in range(iterations):
sum_arr += np.where(function(input_arr) == sample_index, 1, 0)
hit_frequencies = sum_arr / sum_arr.sum() * 100 # [%]
return np.around(hit_frequencies, 2)
maximize_replicate_distances = evaluate(input_arr, maximize_arr_replicate_distances, 1)
print(maximize_replicate_distances)
# Frequency distribution of sample replicate '1' after distance-maximizing shuffling (n = 10,000) [%]:
# [[4.06 2.1 0.93 1.25 1.34 2.02 2.1 1.38 1.21 0.93 2.15 4.04]
# [2.48 0.26 0.15 0.3 0.41 0.85 0.82 0.42 0.28 0.13 0.27 2.42]
# [1.07 0.14 0.13 0.46 0.5 0.58 0.63 0.55 0.44 0.12 0.14 1.08]
# [1.55 0.4 0.9 1.49 0.82 0.7 0.77 0.8 1.48 0.84 0.44 1.61]
# [1.67 0.44 0.92 1.48 0.81 0.74 0.76 0.81 1.42 0.88 0.43 1.72]
# [1.04 0.14 0.12 0.4 0.55 0.59 0.6 0.52 0.47 0.14 0.16 1.09]
# [2.44 0.24 0.15 0.26 0.42 0.88 0.81 0.41 0.3 0.15 0.26 2.33]
# [3.99 2.11 0.97 1.33 1.39 1.98 2.01 1.42 1.3 0.97 2.12 3.96]]