从数组中提取单独的非零块
Extract separate non-zero blocks from array
有一个这样的数组,例如:
[1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1]
在 Python 中获取组织在列表中的非零元素的最快方法是什么,其中每个元素都包含连续非零值块的索引?
这里的结果将是一个包含许多数组的列表:
([0, 1, 2, 3], [9, 10, 11], [14, 15], [20, 21])
>>> L = [1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1]
>>> import itertools
>>> import operator
>>> [[i for i,value in it] for key,it in itertools.groupby(enumerate(L), key=operator.itemgetter(1)) if key != 0]
[[0, 1, 2, 3], [9, 10, 11], [14, 15], [20, 21]]
看看scipy.ndimage.measurements.label
:
import numpy as np
from scipy.ndimage.measurements import label
x = np.asarray([1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1])
labelled, numfeats = label(x)
indices = [np.nonzero(labelled == k) for k in np.unique(labelled)[1:]]
indices
完全符合您的要求。请注意,根据您的最终目标,labelled
可能还会为您提供有用的(额外)信息。
我在 Finding the consecutive zeros in a numpy array 的回答的一个微不足道的变化给出了函数 find_runs
:
def find_runs(value, a):
# Create an array that is 1 where a is `value`, and pad each end with an extra 0.
isvalue = np.concatenate(([0], np.equal(a, value).view(np.int8), [0]))
absdiff = np.abs(np.diff(isvalue))
# Runs start and end where absdiff is 1.
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
例如,
In [43]: x
Out[43]: array([1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1])
In [44]: find_runs(1, x)
Out[44]:
array([[ 0, 4],
[ 9, 12],
[14, 16],
[20, 22]])
In [45]: [range(*run) for run in find_runs(1, x)]
Out[45]: [[0, 1, 2, 3], [9, 10, 11], [14, 15], [20, 21]]
如果您的示例中的值 1
不具有代表性,并且您确实希望运行任何非零值(如问题文本所建议的),您可以更改 np.equal(a, value)
到 (a != 0)
并适当地更改参数和注释。例如
def find_nonzero_runs(a):
# Create an array that is 1 where a is nonzero, and pad each end with an extra 0.
isnonzero = np.concatenate(([0], (np.asarray(a) != 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(isnonzero))
# Runs start and end where absdiff is 1.
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
例如,
In [63]: y
Out[63]:
array([-1, 2, 99, 99, 0, 0, 0, 0, 0, 12, 13, 14, 0, 0, 1, 1, 0,
0, 0, 0, 42, 42])
In [64]: find_nonzero_runs(y)
Out[64]:
array([[ 0, 4],
[ 9, 12],
[14, 16],
[20, 22]])
你可以使用np.split
,一旦你知道非零长度的区间和A
中的相应索引。假设 A
作为输入数组,实现看起来像这样 -
# Append A on either sides with zeros
A_ext = np.diff(np.hstack(([0],A,[0])))
# Find interval of non-zeros lengths
interval_lens = np.where(A_ext==-1)[0] - np.where(A_ext==1)[0]
# Indices of non-zeros places in A
idx = np.arange(A.size)[A!=0]
# Finally split indices based on the interval lengths
out = np.split(idx,interval_lens.cumsum())[:-1]
样本输入、输出-
In [53]: A
Out[53]: array([1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1])
In [54]: out
Out[54]: [array([0, 1, 2, 3]), array([ 9, 10, 11]), array([14, 15]), array([20, 21])]
有一个这样的数组,例如:
[1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1]
在 Python 中获取组织在列表中的非零元素的最快方法是什么,其中每个元素都包含连续非零值块的索引?
这里的结果将是一个包含许多数组的列表:
([0, 1, 2, 3], [9, 10, 11], [14, 15], [20, 21])
>>> L = [1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1]
>>> import itertools
>>> import operator
>>> [[i for i,value in it] for key,it in itertools.groupby(enumerate(L), key=operator.itemgetter(1)) if key != 0]
[[0, 1, 2, 3], [9, 10, 11], [14, 15], [20, 21]]
看看scipy.ndimage.measurements.label
:
import numpy as np
from scipy.ndimage.measurements import label
x = np.asarray([1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1])
labelled, numfeats = label(x)
indices = [np.nonzero(labelled == k) for k in np.unique(labelled)[1:]]
indices
完全符合您的要求。请注意,根据您的最终目标,labelled
可能还会为您提供有用的(额外)信息。
我在 Finding the consecutive zeros in a numpy array 的回答的一个微不足道的变化给出了函数 find_runs
:
def find_runs(value, a):
# Create an array that is 1 where a is `value`, and pad each end with an extra 0.
isvalue = np.concatenate(([0], np.equal(a, value).view(np.int8), [0]))
absdiff = np.abs(np.diff(isvalue))
# Runs start and end where absdiff is 1.
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
例如,
In [43]: x
Out[43]: array([1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1])
In [44]: find_runs(1, x)
Out[44]:
array([[ 0, 4],
[ 9, 12],
[14, 16],
[20, 22]])
In [45]: [range(*run) for run in find_runs(1, x)]
Out[45]: [[0, 1, 2, 3], [9, 10, 11], [14, 15], [20, 21]]
如果您的示例中的值 1
不具有代表性,并且您确实希望运行任何非零值(如问题文本所建议的),您可以更改 np.equal(a, value)
到 (a != 0)
并适当地更改参数和注释。例如
def find_nonzero_runs(a):
# Create an array that is 1 where a is nonzero, and pad each end with an extra 0.
isnonzero = np.concatenate(([0], (np.asarray(a) != 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(isnonzero))
# Runs start and end where absdiff is 1.
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
例如,
In [63]: y
Out[63]:
array([-1, 2, 99, 99, 0, 0, 0, 0, 0, 12, 13, 14, 0, 0, 1, 1, 0,
0, 0, 0, 42, 42])
In [64]: find_nonzero_runs(y)
Out[64]:
array([[ 0, 4],
[ 9, 12],
[14, 16],
[20, 22]])
你可以使用np.split
,一旦你知道非零长度的区间和A
中的相应索引。假设 A
作为输入数组,实现看起来像这样 -
# Append A on either sides with zeros
A_ext = np.diff(np.hstack(([0],A,[0])))
# Find interval of non-zeros lengths
interval_lens = np.where(A_ext==-1)[0] - np.where(A_ext==1)[0]
# Indices of non-zeros places in A
idx = np.arange(A.size)[A!=0]
# Finally split indices based on the interval lengths
out = np.split(idx,interval_lens.cumsum())[:-1]
样本输入、输出-
In [53]: A
Out[53]: array([1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1])
In [54]: out
Out[54]: [array([0, 1, 2, 3]), array([ 9, 10, 11]), array([14, 15]), array([20, 21])]