加速 numpy 过滤
Speed up numpy filtering
我正在制作一个音乐识别程序,作为其中的一部分,我需要从 png(2200x1700 像素)中找到 numpy 数组的最大连接区域。我目前的解决方案如下。
labels, nlabels = ndimage.label(blobs)
cutoff = len(blobs)*len(blobs[0]) / nlabels
blobs_found = 0
x = []
t1 = time()
for n in range(1, nlabels+1):
squares = np.where(labels==n)
if len(squares[0]) < cutoff:
blobs[squares] = 0
else:
blobs_found += 1
blobs[squares] = blobs_found
x.append(squares - np.amin(squares, axis=0, keepdims=True))
nlabels = blobs_found
print(time() - t1)
这可行,但 运行 需要大约 6.5 秒。有没有办法从这段代码中删除循环(或以其他方式加快循环速度)?
您可以获得每个标记区域的大小(以像素为单位):
unique_labels = numpy.unique(labels)
label_sizes = scipy.ndimage.measurement.sum(numpy.ones_like(blobs), labels, unique_labels)
那么最大的将是:
unique_labels[label_size == numpy.max(label_size)]
最快的可能是使用 numpy.bincount
并从那里开始工作。类似于:
labels, nlabels = ndimage.label(blobs)
cutoff = len(blobs)*len(blobs[0]) / float(nlabels)
label_counts = np.bincount(labels)
# Re-label, taking the cutoff into account
cutoff_mask = (label_counts >= cutoff)
cutoff_mask[0] = False
label_mapping = np.zeros_like(label_counts)
label_mapping[cutoff_mask] = np.arange(cutoff_mask.sum()) + 1
# Create an image-array with the updated labels
blobs = label_mapping[labels].astype(blobs.dtype)
这可以进一步优化速度,但我的目标是提高可读性。
我正在制作一个音乐识别程序,作为其中的一部分,我需要从 png(2200x1700 像素)中找到 numpy 数组的最大连接区域。我目前的解决方案如下。
labels, nlabels = ndimage.label(blobs)
cutoff = len(blobs)*len(blobs[0]) / nlabels
blobs_found = 0
x = []
t1 = time()
for n in range(1, nlabels+1):
squares = np.where(labels==n)
if len(squares[0]) < cutoff:
blobs[squares] = 0
else:
blobs_found += 1
blobs[squares] = blobs_found
x.append(squares - np.amin(squares, axis=0, keepdims=True))
nlabels = blobs_found
print(time() - t1)
这可行,但 运行 需要大约 6.5 秒。有没有办法从这段代码中删除循环(或以其他方式加快循环速度)?
您可以获得每个标记区域的大小(以像素为单位):
unique_labels = numpy.unique(labels)
label_sizes = scipy.ndimage.measurement.sum(numpy.ones_like(blobs), labels, unique_labels)
那么最大的将是:
unique_labels[label_size == numpy.max(label_size)]
最快的可能是使用 numpy.bincount
并从那里开始工作。类似于:
labels, nlabels = ndimage.label(blobs)
cutoff = len(blobs)*len(blobs[0]) / float(nlabels)
label_counts = np.bincount(labels)
# Re-label, taking the cutoff into account
cutoff_mask = (label_counts >= cutoff)
cutoff_mask[0] = False
label_mapping = np.zeros_like(label_counts)
label_mapping[cutoff_mask] = np.arange(cutoff_mask.sum()) + 1
# Create an image-array with the updated labels
blobs = label_mapping[labels].astype(blobs.dtype)
这可以进一步优化速度,但我的目标是提高可读性。