拉伸数组并填充 nan
Stretch an array and fill nan
我有一个长度为 n 的一维 numpy 数组,我想将它拉伸到 m (n
例如:
>>> arr = [4,5,1,2,6,8] # take this
>>> stretch(arr,8)
[4,5,np.nan,1,2,np.nan,6,8] # convert to this
要求:
1.两端没有nan(如果可能的话)
2. 全力以赴
我试过了
>>> def stretch(x,to,fill=np.nan):
... step = to/len(x)
... output = np.repeat(fill,to)
... foreign = np.arange(0,to,step).round().astype(int)
... output[foreign] = x
... return output
>>> arr = np.random.rand(6553)
>>> stretch(arr,6622)
File "<ipython-input-216-0202bc39278e>", line 2, in <module>
stretch(arr,6622)
File "<ipython-input-211-177ee8bc10a7>", line 9, in stretch
output[foreign] = x
ValueError: shape mismatch: value array of shape (6553,) could not be broadcast to indexing result of shape (6554,)
似乎无法正常工作(对于长度为 6553 的数组,违反了要求 2,并且不保证要求 1),是否有解决此问题的线索?
您可以使用 resize
调整数组大小。
调整大小后,您可以应用适当的逻辑来重新排列内容。
检查以下内容link:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.resize.html
这种方法将非 nan 元素放在边界处,将 nan
值留在中心,尽管它不会 space nan
值均匀分布。
arr = [4,5,1,2,6,8]
stretch_len = 8
def stretch(arr, stretch_len):
stretched_arr = np.empty(stretch_len)
stretched_arr.fill(np.nan)
arr_len = len(arr)
if arr_len % 2 == 0:
mid = int(arr_len/2)
stretched_arr[:mid] = arr[:mid]
stretched_arr[-mid:] = arr[-mid:]
else:
mid = int(np.floor(arr_len/2))
stretched_arr[:mid] = arr[:mid]
stretched_arr[-mid-1:] = arr[-mid-1:]
return stretched_arr
以下是我测试的一些测试用例:
测试用例:
In [104]: stretch(arr, stretch_len)
Out[104]: array([ 4., 5., 1., nan, nan, 2., 6., 8.])
In [105]: arr = [4, 5, 1, 2, 6, 8, 9]
In [106]: stretch(arr, stretch_len)
Out[106]: array([ 4., 5., 1., nan, 2., 6., 8., 9.])
In [107]: stretch(arr, 9)
Out[107]: array([ 4., 5., 1., nan, nan, 2., 6., 8., 9.])
使用roundrobin
from itertools
Recipes:
from itertools import cycle, islice
def roundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
# Recipe credited to George Sakkis
pending = len(iterables)
nexts = cycle(iter(it).__next__ for it in iterables)
while pending:
try:
for next in nexts:
yield next()
except StopIteration:
pending -= 1
nexts = cycle(islice(nexts, pending))
def stretch(x, to, fill=np.nan):
n_gaps = to - len(x)
return np.hstack([*roundrobin(np.array_split(x, n_gaps+1), np.repeat(fill, n_gaps))])
arr = [4,5,1,2,6,8]
stretch(arr, 8)
# array([ 4., 5., nan, 1., 2., nan, 6., 8.])
arr2 = np.random.rand(655)
stretched_arr2 = stretch(arr,662)
np.diff(np.argwhere(np.isnan(stretched_arr2)), axis=0)
# nans are evenly spaced
array([[83],
[83],
[83],
[83],
[83],
[83]])
背后的逻辑
n_gaps
:计算要填充的空隙数(所需长度 - 当前长度)
np_array_split
:使用n_gaps+1
,它将输入数组拆分成尽可能相同的长度
roundrobin
:由于 np_array_split
生成的数组比间隙多一个数组,循环法(即交替迭代)授予 np.nan
永远不会在结果的任何一端。
虽然 解决了问题,但我找到了一个更简短的答案,这可能会有所帮助,
def stretch2(x,to,fill=np.nan):
output = np.repeat(fill,to)
foreign = np.linspace(0,to-1,len(x)).round().astype(int)
output[foreign] = x
return output
与我的第一次尝试非常相似。计时:
>>> x = np.random.rand(1000)
>>> to = 1200
>>> %timeit stretch(x,to) # Chris' version
>>> %timeit stretch2(x,to)
996 µs ± 22.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
32.2 µs ± 339 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
检查是否正常:
>>> aa = stretch2(x,to)
>>> np.diff(np.where(np.isnan(aa))[0])
array([6, 6, 6, ... , 6])
>>> np.sum(aa[~np.isnan(aa)] - x)
0.0
检查边界条件:
>>> aa[:5]
array([0.78581616, 0.1630689 , 0.52039993, nan, 0.89844404])
>>> aa[-5:]
array([0.7063653 , nan, 0.2022172 , 0.94604503, 0.91201897])
都满意。适用于所有一维数组,并且可以泛化为也适用于 n 维数组,只需进行一些更改。
我有一个长度为 n 的一维 numpy 数组,我想将它拉伸到 m (n 例如: 要求:
1.两端没有nan(如果可能的话)
2. 全力以赴 我试过了 似乎无法正常工作(对于长度为 6553 的数组,违反了要求 2,并且不保证要求 1),是否有解决此问题的线索?>>> arr = [4,5,1,2,6,8] # take this
>>> stretch(arr,8)
[4,5,np.nan,1,2,np.nan,6,8] # convert to this
>>> def stretch(x,to,fill=np.nan):
... step = to/len(x)
... output = np.repeat(fill,to)
... foreign = np.arange(0,to,step).round().astype(int)
... output[foreign] = x
... return output
>>> arr = np.random.rand(6553)
>>> stretch(arr,6622)
File "<ipython-input-216-0202bc39278e>", line 2, in <module>
stretch(arr,6622)
File "<ipython-input-211-177ee8bc10a7>", line 9, in stretch
output[foreign] = x
ValueError: shape mismatch: value array of shape (6553,) could not be broadcast to indexing result of shape (6554,)
您可以使用 resize
调整数组大小。
调整大小后,您可以应用适当的逻辑来重新排列内容。
检查以下内容link: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.resize.html
这种方法将非 nan 元素放在边界处,将 nan
值留在中心,尽管它不会 space nan
值均匀分布。
arr = [4,5,1,2,6,8]
stretch_len = 8
def stretch(arr, stretch_len):
stretched_arr = np.empty(stretch_len)
stretched_arr.fill(np.nan)
arr_len = len(arr)
if arr_len % 2 == 0:
mid = int(arr_len/2)
stretched_arr[:mid] = arr[:mid]
stretched_arr[-mid:] = arr[-mid:]
else:
mid = int(np.floor(arr_len/2))
stretched_arr[:mid] = arr[:mid]
stretched_arr[-mid-1:] = arr[-mid-1:]
return stretched_arr
以下是我测试的一些测试用例:
测试用例:
In [104]: stretch(arr, stretch_len)
Out[104]: array([ 4., 5., 1., nan, nan, 2., 6., 8.])
In [105]: arr = [4, 5, 1, 2, 6, 8, 9]
In [106]: stretch(arr, stretch_len)
Out[106]: array([ 4., 5., 1., nan, 2., 6., 8., 9.])
In [107]: stretch(arr, 9)
Out[107]: array([ 4., 5., 1., nan, nan, 2., 6., 8., 9.])
使用roundrobin
from itertools
Recipes:
from itertools import cycle, islice
def roundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
# Recipe credited to George Sakkis
pending = len(iterables)
nexts = cycle(iter(it).__next__ for it in iterables)
while pending:
try:
for next in nexts:
yield next()
except StopIteration:
pending -= 1
nexts = cycle(islice(nexts, pending))
def stretch(x, to, fill=np.nan):
n_gaps = to - len(x)
return np.hstack([*roundrobin(np.array_split(x, n_gaps+1), np.repeat(fill, n_gaps))])
arr = [4,5,1,2,6,8]
stretch(arr, 8)
# array([ 4., 5., nan, 1., 2., nan, 6., 8.])
arr2 = np.random.rand(655)
stretched_arr2 = stretch(arr,662)
np.diff(np.argwhere(np.isnan(stretched_arr2)), axis=0)
# nans are evenly spaced
array([[83],
[83],
[83],
[83],
[83],
[83]])
背后的逻辑
n_gaps
:计算要填充的空隙数(所需长度 - 当前长度)
np_array_split
:使用n_gaps+1
,它将输入数组拆分成尽可能相同的长度
roundrobin
:由于 np_array_split
生成的数组比间隙多一个数组,循环法(即交替迭代)授予 np.nan
永远不会在结果的任何一端。
虽然
def stretch2(x,to,fill=np.nan):
output = np.repeat(fill,to)
foreign = np.linspace(0,to-1,len(x)).round().astype(int)
output[foreign] = x
return output
与我的第一次尝试非常相似。计时:
>>> x = np.random.rand(1000)
>>> to = 1200
>>> %timeit stretch(x,to) # Chris' version
>>> %timeit stretch2(x,to)
996 µs ± 22.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
32.2 µs ± 339 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
检查是否正常:
>>> aa = stretch2(x,to)
>>> np.diff(np.where(np.isnan(aa))[0])
array([6, 6, 6, ... , 6])
>>> np.sum(aa[~np.isnan(aa)] - x)
0.0
检查边界条件:
>>> aa[:5]
array([0.78581616, 0.1630689 , 0.52039993, nan, 0.89844404])
>>> aa[-5:]
array([0.7063653 , nan, 0.2022172 , 0.94604503, 0.91201897])
都满意。适用于所有一维数组,并且可以泛化为也适用于 n 维数组,只需进行一些更改。