从 python 中的二维数组中随机采样子数组
Randomly sample sub-arrays from a 2D array in python
问题:
假设我有一个二维数组,我想从中随机采样(使用蒙特卡洛)较小的二维子数组,如下图中的黑色补丁所示。我正在寻找一种有效的方法。
预期(但部分)解决方案:
我遇到了一个 function 经过几个小时的搜索,它部分实现了我想要做的事情,但它缺乏在随机位置对补丁进行采样的能力。至少我认为它不能根据其参数从随机位置进行采样,尽管它确实有一个我不理解的 random_state
参数。
sklearn.feature_extraction.image.extract_patches_2d(image, patch_size, max_patches=None, random_state=None)
问题:
Select 随机补丁坐标(二维子数组)并使用它们从更大的数组中切出一个补丁,如上图所示。允许随机采样的补丁重叠。
这是一个采样器,它可以从任意维度的数组中创建样本剪切。它使用函数来控制从何处开始切割以及切割沿任何轴的宽度。
参数说明如下:
arr
- 输入 numpy 数组。
loc_sampler_fn
- 这是您要用来设置框角的函数。如果你想让盒子的角从沿轴的任意位置均匀采样,使用np.random.uniform
。如果希望角更靠近数组的中心,则使用np.random.normal
。但是,我们需要告诉函数要采样的范围。这将我们带到下一个参数。
loc_dim_param
- 这会将每个轴的大小传递给 loc_sampler_fn
。如果我们使用 np.random.uniform
作为位置采样器,我们希望从轴的整个范围内进行采样。 np.random.uniform
有两个参数:low
和 high
,因此通过将轴的长度传递给 high
,它会在整个轴上均匀采样。换句话说,如果轴的长度为 120
,我们想要 np.random.uniform(low=0, high=120)
,那么我们将设置 loc_dim_param='high'
。
loc_params
- 这会将任何附加参数传递给 loc_sampler_fn
。与示例保持一致,我们需要将 low=0
传递给 np.random.uniform
,因此我们传递字典 loc_params={'low':0}
.
从这里看,盒子的形状基本一致。如果您希望框的高度和宽度从 3 到 10 均匀采样,请传入 shape_sampler_fn=np.random.uniform
,shape_dim_param=None
因为我们没有使用任何轴的大小,并且 shape_params={'low':3, 'high':11}
.
def box_sampler(arr,
loc_sampler_fn,
loc_dim_param,
loc_params,
shape_sampler_fn,
shape_dim_param,
shape_params):
'''
Extracts a sample cut from `arr`.
Parameters:
-----------
loc_sampler_fn : function
The function to determine the where the minimum coordinate
for each axis should be placed.
loc_dim_param : string or None
The parameter in `loc_sampler_fn` that should use the axes
dimension size
loc_params : dict
Parameters to pass to `loc_sampler_fn`.
shape_sampler_fn : function
The function to determine the width of the sample cut
along each axis.
shape_dim_param : string or None
The parameter in `shape_sampler_fn` that should use the
axes dimension size.
shape_params : dict
Parameters to pass to `shape_sampler_fn`.
Returns:
--------
(slices, x) : A tuple of the slices used to cut the sample as well as
the sampled subsection with the same dimensionality of arr.
slice :: list of slice objects
x :: array object with the same ndims as arr
'''
slices = []
for dim in arr.shape:
if loc_dim_param:
loc_params.update({loc_dim_param: dim})
if shape_dim_param:
shape_params.update({shape_dim_param: dim})
start = int(loc_sampler_fn(**loc_params))
stop = start + int(shape_sampler_fn(**shape_params))
slices.append(slice(start, stop))
return slices, arr[slices]
宽度在 3 到 9 之间的二维数组的均匀切割示例:
a = np.random.randint(0, 1+1, size=(100,150))
box_sampler(a,
np.random.uniform, 'high', {'low':0},
np.random.uniform, None, {'low':3, 'high':10})
# returns:
([slice(49, 55, None), slice(86, 89, None)],
array([[0, 0, 1],
[0, 1, 1],
[0, 0, 0],
[0, 0, 1],
[1, 1, 1],
[1, 1, 0]]))
从 10x20x30 3D 数组中提取 2x2x2 块的示例:
a = np.random.randint(0,2,size=(10,20,30))
box_sampler(a, np.random.uniform, 'high', {'low':0},
np.random.uniform, None, {'low':2, 'high':2})
# returns:
([slice(7, 9, None), slice(9, 11, None), slice(19, 21, None)],
array([[[0, 1],
[1, 0]],
[[0, 1],
[1, 1]]]))
根据评论更新。
对于您的特定目的,您似乎想要一个矩形样本,其中起始角从阵列中的任何位置均匀采样,并且样本沿每个轴的宽度均匀采样,但可以限制。
这是生成这些样本的函数。 min_width
和 max_width
可以接受整数的迭代(例如元组)或单个整数。
def uniform_box_sampler(arr, min_width, max_width):
'''
Extracts a sample cut from `arr`.
Parameters:
-----------
arr : array
The numpy array to sample a box from
min_width : int or tuple
The minimum width of the box along a given axis.
If a tuple of integers is supplied, it my have the
same length as the number of dimensions of `arr`
max_width : int or tuple
The maximum width of the box along a given axis.
If a tuple of integers is supplied, it my have the
same length as the number of dimensions of `arr`
Returns:
--------
(slices, x) : A tuple of the slices used to cut the sample as well as
the sampled subsection with the same dimensionality of arr.
slice :: list of slice objects
x :: array object with the same ndims as arr
'''
if isinstance(min_width, (tuple, list)):
assert len(min_width)==arr.ndim, 'Dimensions of `min_width` and `arr` must match'
else:
min_width = (min_width,)*arr.ndim
if isinstance(max_width, (tuple, list)):
assert len(max_width)==arr.ndim, 'Dimensions of `max_width` and `arr` must match'
else:
max_width = (max_width,)*arr.ndim
slices = []
for dim, mn, mx in zip(arr.shape, min_width, max_width):
fn = np.random.uniform
start = int(np.random.uniform(0,dim))
stop = start + int(np.random.uniform(mn, mx+1))
slices.append(slice(start, stop))
return slices, arr[slices]
生成从数组中任意位置均匀开始的框切割的示例,高度是从 1 到 4 的随机均匀抽取,宽度是从 2 到 6 的随机均匀抽取(仅用于展示)。在这种情况下,框的大小为 3 x 4,从第 66 行和第 19 列开始。
x = np.random.randint(0,2,size=(100,100))
uniform_box_sampler(x, (1,2), (4,6))
# returns:
([slice(65, 68, None), slice(18, 22, None)],
array([[1, 0, 0, 0],
[0, 0, 1, 1],
[0, 1, 1, 0]]))
所以您的 sklearn.feature_extraction.image.extract_patches_2d
问题似乎是它迫使您指定单个补丁大小,而您正在寻找随机大小的不同补丁。
这里要注意的一件事是您的结果不能是 NumPy 数组(与 sklearn 函数的结果不同),因为数组必须具有统一长度 rows/columns。所以你的输出需要是一些包含不同形状数组的其他数据结构。
解决方法如下:
from itertools import product
def random_patches_2d(arr, n_patches):
# The all possible row and column slices from `arr` given its shape
row, col = arr.shape
row_comb = [(i, j) for i, j in product(range(row), range(row)) if i < j]
col_comb = [(i, j) for i, j in product(range(col), range(col)) if i < j]
# Pick randomly from the possible slices. The distribution will be
# random uniform from the given slices. We can't use
# np.random.choice because it only samples from a 1d array.
a = np.random.choice(np.arange(len(row_comb)), size=n_patches)
b = np.random.choice(np.arange(len(col_comb)), size=n_patches)
for i, j in zip(a, b):
yield arr[row_comb[i][0]:row_comb[i][1],
col_comb[i][0]:col_comb[i][1]]
示例:
np.random.seed(99)
arr = np.arange(49).reshape(7, 7)
res = list(random_patches_2d(arr, 5))
print(res[0])
print()
print(res[3])
[[0 1]
[7 8]]
[[ 8 9 10 11]
[15 16 17 18]
[22 23 24 25]
[29 30 31 32]]
压缩:
def random_patches_2d(arr, n_patches):
row, col = arr.shape
row_comb = [(i, j) for i, j in product(range(row), range(row)) if i < j]
col_comb = [(i, j) for i, j in product(range(col), range(col)) if i < j]
a = np.random.choice(np.arange(len(row_comb)), size=n_patches)
b = np.random.choice(np.arange(len(col_comb)), size=n_patches)
for i, j in zip(a, b):
yield arr[row_comb[i][0]:row_comb[i][1],
col_comb[i][0]:col_comb[i][1]]
解决您的意见:您可以连续添加 1 个补丁并在每个之后检查区域。
# `size` is just row x col
area = arr.size
patch_area = 0
while patch_area <= area: # or while patch_area <= 0.1 * area:
patch = random_patches_2d(arr, n_patches=1)
patch_area += patch
问题:
假设我有一个二维数组,我想从中随机采样(使用蒙特卡洛)较小的二维子数组,如下图中的黑色补丁所示。我正在寻找一种有效的方法。
预期(但部分)解决方案:
我遇到了一个 function 经过几个小时的搜索,它部分实现了我想要做的事情,但它缺乏在随机位置对补丁进行采样的能力。至少我认为它不能根据其参数从随机位置进行采样,尽管它确实有一个我不理解的 random_state
参数。
sklearn.feature_extraction.image.extract_patches_2d(image, patch_size, max_patches=None, random_state=None)
问题:
Select 随机补丁坐标(二维子数组)并使用它们从更大的数组中切出一个补丁,如上图所示。允许随机采样的补丁重叠。
这是一个采样器,它可以从任意维度的数组中创建样本剪切。它使用函数来控制从何处开始切割以及切割沿任何轴的宽度。
参数说明如下:
arr
- 输入 numpy 数组。loc_sampler_fn
- 这是您要用来设置框角的函数。如果你想让盒子的角从沿轴的任意位置均匀采样,使用np.random.uniform
。如果希望角更靠近数组的中心,则使用np.random.normal
。但是,我们需要告诉函数要采样的范围。这将我们带到下一个参数。loc_dim_param
- 这会将每个轴的大小传递给loc_sampler_fn
。如果我们使用np.random.uniform
作为位置采样器,我们希望从轴的整个范围内进行采样。np.random.uniform
有两个参数:low
和high
,因此通过将轴的长度传递给high
,它会在整个轴上均匀采样。换句话说,如果轴的长度为120
,我们想要np.random.uniform(low=0, high=120)
,那么我们将设置loc_dim_param='high'
。loc_params
- 这会将任何附加参数传递给loc_sampler_fn
。与示例保持一致,我们需要将low=0
传递给np.random.uniform
,因此我们传递字典loc_params={'low':0}
.
从这里看,盒子的形状基本一致。如果您希望框的高度和宽度从 3 到 10 均匀采样,请传入 shape_sampler_fn=np.random.uniform
,shape_dim_param=None
因为我们没有使用任何轴的大小,并且 shape_params={'low':3, 'high':11}
.
def box_sampler(arr,
loc_sampler_fn,
loc_dim_param,
loc_params,
shape_sampler_fn,
shape_dim_param,
shape_params):
'''
Extracts a sample cut from `arr`.
Parameters:
-----------
loc_sampler_fn : function
The function to determine the where the minimum coordinate
for each axis should be placed.
loc_dim_param : string or None
The parameter in `loc_sampler_fn` that should use the axes
dimension size
loc_params : dict
Parameters to pass to `loc_sampler_fn`.
shape_sampler_fn : function
The function to determine the width of the sample cut
along each axis.
shape_dim_param : string or None
The parameter in `shape_sampler_fn` that should use the
axes dimension size.
shape_params : dict
Parameters to pass to `shape_sampler_fn`.
Returns:
--------
(slices, x) : A tuple of the slices used to cut the sample as well as
the sampled subsection with the same dimensionality of arr.
slice :: list of slice objects
x :: array object with the same ndims as arr
'''
slices = []
for dim in arr.shape:
if loc_dim_param:
loc_params.update({loc_dim_param: dim})
if shape_dim_param:
shape_params.update({shape_dim_param: dim})
start = int(loc_sampler_fn(**loc_params))
stop = start + int(shape_sampler_fn(**shape_params))
slices.append(slice(start, stop))
return slices, arr[slices]
宽度在 3 到 9 之间的二维数组的均匀切割示例:
a = np.random.randint(0, 1+1, size=(100,150))
box_sampler(a,
np.random.uniform, 'high', {'low':0},
np.random.uniform, None, {'low':3, 'high':10})
# returns:
([slice(49, 55, None), slice(86, 89, None)],
array([[0, 0, 1],
[0, 1, 1],
[0, 0, 0],
[0, 0, 1],
[1, 1, 1],
[1, 1, 0]]))
从 10x20x30 3D 数组中提取 2x2x2 块的示例:
a = np.random.randint(0,2,size=(10,20,30))
box_sampler(a, np.random.uniform, 'high', {'low':0},
np.random.uniform, None, {'low':2, 'high':2})
# returns:
([slice(7, 9, None), slice(9, 11, None), slice(19, 21, None)],
array([[[0, 1],
[1, 0]],
[[0, 1],
[1, 1]]]))
根据评论更新。
对于您的特定目的,您似乎想要一个矩形样本,其中起始角从阵列中的任何位置均匀采样,并且样本沿每个轴的宽度均匀采样,但可以限制。
这是生成这些样本的函数。 min_width
和 max_width
可以接受整数的迭代(例如元组)或单个整数。
def uniform_box_sampler(arr, min_width, max_width):
'''
Extracts a sample cut from `arr`.
Parameters:
-----------
arr : array
The numpy array to sample a box from
min_width : int or tuple
The minimum width of the box along a given axis.
If a tuple of integers is supplied, it my have the
same length as the number of dimensions of `arr`
max_width : int or tuple
The maximum width of the box along a given axis.
If a tuple of integers is supplied, it my have the
same length as the number of dimensions of `arr`
Returns:
--------
(slices, x) : A tuple of the slices used to cut the sample as well as
the sampled subsection with the same dimensionality of arr.
slice :: list of slice objects
x :: array object with the same ndims as arr
'''
if isinstance(min_width, (tuple, list)):
assert len(min_width)==arr.ndim, 'Dimensions of `min_width` and `arr` must match'
else:
min_width = (min_width,)*arr.ndim
if isinstance(max_width, (tuple, list)):
assert len(max_width)==arr.ndim, 'Dimensions of `max_width` and `arr` must match'
else:
max_width = (max_width,)*arr.ndim
slices = []
for dim, mn, mx in zip(arr.shape, min_width, max_width):
fn = np.random.uniform
start = int(np.random.uniform(0,dim))
stop = start + int(np.random.uniform(mn, mx+1))
slices.append(slice(start, stop))
return slices, arr[slices]
生成从数组中任意位置均匀开始的框切割的示例,高度是从 1 到 4 的随机均匀抽取,宽度是从 2 到 6 的随机均匀抽取(仅用于展示)。在这种情况下,框的大小为 3 x 4,从第 66 行和第 19 列开始。
x = np.random.randint(0,2,size=(100,100))
uniform_box_sampler(x, (1,2), (4,6))
# returns:
([slice(65, 68, None), slice(18, 22, None)],
array([[1, 0, 0, 0],
[0, 0, 1, 1],
[0, 1, 1, 0]]))
所以您的 sklearn.feature_extraction.image.extract_patches_2d
问题似乎是它迫使您指定单个补丁大小,而您正在寻找随机大小的不同补丁。
这里要注意的一件事是您的结果不能是 NumPy 数组(与 sklearn 函数的结果不同),因为数组必须具有统一长度 rows/columns。所以你的输出需要是一些包含不同形状数组的其他数据结构。
解决方法如下:
from itertools import product
def random_patches_2d(arr, n_patches):
# The all possible row and column slices from `arr` given its shape
row, col = arr.shape
row_comb = [(i, j) for i, j in product(range(row), range(row)) if i < j]
col_comb = [(i, j) for i, j in product(range(col), range(col)) if i < j]
# Pick randomly from the possible slices. The distribution will be
# random uniform from the given slices. We can't use
# np.random.choice because it only samples from a 1d array.
a = np.random.choice(np.arange(len(row_comb)), size=n_patches)
b = np.random.choice(np.arange(len(col_comb)), size=n_patches)
for i, j in zip(a, b):
yield arr[row_comb[i][0]:row_comb[i][1],
col_comb[i][0]:col_comb[i][1]]
示例:
np.random.seed(99)
arr = np.arange(49).reshape(7, 7)
res = list(random_patches_2d(arr, 5))
print(res[0])
print()
print(res[3])
[[0 1]
[7 8]]
[[ 8 9 10 11]
[15 16 17 18]
[22 23 24 25]
[29 30 31 32]]
压缩:
def random_patches_2d(arr, n_patches):
row, col = arr.shape
row_comb = [(i, j) for i, j in product(range(row), range(row)) if i < j]
col_comb = [(i, j) for i, j in product(range(col), range(col)) if i < j]
a = np.random.choice(np.arange(len(row_comb)), size=n_patches)
b = np.random.choice(np.arange(len(col_comb)), size=n_patches)
for i, j in zip(a, b):
yield arr[row_comb[i][0]:row_comb[i][1],
col_comb[i][0]:col_comb[i][1]]
解决您的意见:您可以连续添加 1 个补丁并在每个之后检查区域。
# `size` is just row x col
area = arr.size
patch_area = 0
while patch_area <= area: # or while patch_area <= 0.1 * area:
patch = random_patches_2d(arr, n_patches=1)
patch_area += patch