如何创建以 Python 中的点列表为中心的二元 3 维椭球矩阵?

How to create a binary 3-dimensional matrix of ellipsoids with centers from a list of points in Python?

目标

我有一个长度为 M 的 3 维坐标点数组作为浮点数。我想创建一个预定义形状的 3 维 numpy 数组,其中填充了以这些点为中心的给定浮点半径的椭圆体。因为这是用于图像处理的,所以我将数组中的每个值称为“像素”。如果这些椭圆体重叠,我想通过欧氏距离将像素分配到更近的中心。最终输出将是一个 numpy 数组,背景为 0,椭圆体中的像素编号为 1、2、... M,对应于初始坐标列表,类似于 scipy 的输出 ndimage.label(...).

下面,我有一个天真的方法,它考虑输出数组中的每个位置并将其与每个定义的中心进行比较,为任何椭圆体内的任何像素创建一个值为 1 的二进制数组。然后它使用 scikit-image 来分水岭这个二进制数组。虽然这段代码有效,但对我来说速度太慢了,这既是因为它考虑了每个像素和中心对,也是因为它单独执行分水岭。我怎样才能加快这段代码的速度?

天真的例子

def define_centromeres(template_image, centers_of_mass, xradius = 4.5, yradius = 4.5, zradius = 3.5):
    """ Creates a binary N-dimensional numpy array of ellipsoids.

    :param template_image: An N-dimensional numpy array of the same shape as the output array.
    :param centers_of_mass: A list of lists of N floats defining the centers of spots.
    :param zradius: A float defining the radius in pixels in the z direction of the ellipsoids.
    :param xradius: A float defining the radius in pixels in the x direction of the ellipsoids.
    :param yradius: A float defining the radius in pixels in the y direction of the ellipsoids.
    :return: A binary N-dimensional numpy array.
    """
    out = np.full_like(template_image, 0, dtype=int)
    for idx, val in np.ndenumerate(template_image):
        z, x, y = idx
        for point in centers_of_mass:
            pz, px, py = point[0], point[1], point[2]
            if (((z - pz)/zradius)**2 + ((x - px)/xradius)**2 + ((y - py)/yradius)**2) <= 1:
                out[z, x, y] = 1
                break
    return out

Scikit-image的分水岭功能;通过更改此方法中的代码不太可能找到加速:

def watershed_image(binary_input_image):
    bg_distance = ndi.distance_transform_edt(binary_input_image,
                                             return_distances=True,
                                             return_indices=False)
    local_maxima = peak_local_max(bg_distance, min_distance=1, labels=binary_input_image)
    bg_mask = np.zeros(bg_distance.shape, dtype=bool)
    bg_mask[tuple(local_maxima.T)] = True
    marks, _ = ndi.label(bg_mask)
    output_watershed = watershed(-bg_distance, marks, mask=binary_input_image)
    return output_watershed

小规模示例数据:

zdim, xdim, ydim = 15, 100, 100
example_shape = np.zeros((zdim,xdim,ydim))
example_points = np.random.random_sample(size=(10,3))*np.array([zdim,xdim,ydim])
center_spots_image = define_centromeres(example_shape, example_points)
watershed_spots = watershed_image(center_spots_image)

输出:

center_spots_image, max projected to 2d

watershed_spots, max projected to 2d

请注意,这些图像只是最终 3d 输出数组的 2d 表示。

补充说明

输出数组的典型大小为31x512x512,即总共8.1e6个值,输入坐标的典型大小为40个3维坐标点。我想针对这个规模优化这个程序的速度。

我在这个项目中使用了 numpy、scipy 和 scikit-image,我必须坚持使用这些和其他维护良好且记录良好的包。

对于上述代码的可访问性错误或我的解释不够清晰,我们深表歉意。我是一名研究科学家,几乎没有接受过正规的计算机科学培训。

非常酷的项目。谢谢你。以下代码将椭圆体添加到现有数组。由于我的方法不依赖于检查每个像素,所以我认为图片的总大小无关紧要。这主要取决于椭圆体的数量和半径。对于您的示例半径,它大约需要 ~209 ms ± 8.44 ms。因此,如果所有其他半径都具有相似的大小,那么您的 40 points/ellipsoids 应该需要 ~8.36 s。这对我来说听起来可行。

此外,我相信使用椭圆体的对称性可以加快速度。如果考虑平行于坐标平面的平面,则平面会抛出椭圆体的中心。这些平面将椭圆体分成 8 个全等部分。我相信一个人可以只计算一个并将其映射到其他 7 个中。

def generate_constraint(radii, x=np.nan, y=np.nan, z=np.nan):
    p = np.array([x,y,z]) 
    A = np.diag(1/np.array(radii))
    i = np.where(np.isnan(p))[0][0]   
    assert np.sum(np.isnan(p)) == 1
    
    def constraint(x):
        p[i] = x[0]
        return np.linalg.norm(A@(p-center))
    return NonlinearConstraint(constraint, -np.inf, 1)

def get_range(x0, x1, center, radii, x=np.nan, y=np.nan, z=np.nan):
    """takes fix coordinates for two of x,y,z, center and radii of
    the ellipsoid and guesses for the range of the third that lie in
    the ellipsoid and returns the range"""
    nlc = generate_constraint(radii, x, y, z)
    a = minimize(lambda x: x[0], x0=x0, constraints=[nlc], tol=0.1)
    b = minimize(lambda x: -x[0], x0=x1, constraints=[nlc], tol=0.1)
    return int(np.round(a.x.item())), int(np.round(b.x.item()))


def add_ellipsoid(out, center, radii):
    x0, x1 = center[0], center[0]
    y0, y1 = center[1], center[1]
    z0, z1 = center[2], center[2]
    
    x0, x1 = get_range(x0, x1, center, radii, x=np.nan, y=center[1], z=center[2])
    for x in np.arange(x0, x1+1):
        y0,y1 = get_range(y0, y1, center, radii, x=x, z=center[2])
        for y in np.arange(y0,y1+1):
            z0,z1 = get_range(z0, z1, center, radii, x=x, y=y)
            out[x,y,z0:z1+1] = 1
            
n = 25
center = np.array([n,n,n])/2
radii = [4.5, 4.5, 3.5]

out = np.zeros((n,n,n))
add_ellipsoid(out, center, radii)

加速 numpy 代码的黄金法则是尽可能向量化。这将循环从 Python 移动到更快的 C 例程中。这里的主要减速来自使用 np.ndenumerate 遍历数组中的每个元素。您可以使用 np.indices 完全在 numpy 中完成此操作,再加上一些广播以使数组对齐:

def define_centromeres_vec(template_image, centers_of_mass, xradius=4.5, yradius=4.5, zradius=3.5):
    """Creates a binary N-dimensional numpy array of ellipsoids.

    :param template_image: An N-dimensional numpy array of the same shape as the output array.
    :param centers_of_mass: A list of lists of N floats defining the centers of spots.
    :param zradius: A float defining the radius in pixels in the z direction of the ellipsoids.
    :param xradius: A float defining the radius in pixels in the x direction of the ellipsoids.
    :param yradius: A float defining the radius in pixels in the y direction of the ellipsoids.
    :return: A binary N-dimensional numpy array.
    """
    out = np.zeros_like(template_image, dtype=int)
    # indices[:, z, x, y] = [z, x, y]
    indices = np.indices(template_image.shape, dtype=float)
    radii = np.asarray([zradius, xradius, yradius], dtype=float)
    for point in centers_of_mass:
        mask = np.sum(((indices - point[:,None,None,None]) / radii[:,None,None,None]) ** 2, axis=0) <= 1
        out[mask] = 1
    return out

这使我在示例数据上的速度提高了大约 60 倍:2.83s ± 116ms 到 44.4ms ± 555µs,包括分水岭。

如果我们将 radii 的广播移到循环之外,我们可以获得更快的速度:

    # ...
    radii_bcast = np.broadcast_to(radii[:, None, None, None], shape=indices.shape)
    for point in centers_of_mass:
        mask = np.sum(((indices - point[:,None,None,None]) / radii_bcast) ** 2, axis=0) <= 1
        out[mask] = 1
    return out

对于 define_centromeres,这又增加了 2 倍左右,但现在总时间中有很多时间花在了分水岭例程 (21ms ± 60µs) 上。

最后,我们可以通过对同一循环中的像素进行分类来消除分水岭步骤。我们可以只存储 COM 的索引(加 1),而不是将 out[mask] 设置为 1。 在此之前,我们使用 mask & out 检查与先前椭圆体的任何重叠(当前椭圆体与先前椭圆体重叠的任何地方都是 True )。 对于任何重叠部分,我们可以使用 scipy.spatial.KDTree 为每个点获取最接近的 COM:

from scipy.spatial import KDTree

def define_centromeres_labeled(template_image, centers_of_mass, xradius=4.5, yradius=4.5, zradius=3.5):
    """Creates a labeled N-dimensional numpy array of ellipsoids.

    :param template_image: An N-dimensional numpy array of the same shape as the output array.
    :param centers_of_mass: A list of lists of N floats defining the centers of spots.
    :param zradius: A float defining the radius in pixels in the z direction of the ellipsoids.
    :param xradius: A float defining the radius in pixels in the x direction of the ellipsoids.
    :param yradius: A float defining the radius in pixels in the y direction of the ellipsoids.
    :return: An N-dimensional numpy array, with label `n` for the ellipsoid at index `n-1`.
    """
    out = np.zeros_like(template_image, dtype=int)
    # indices[:, z, x, y] = [z, x, y]
    indices = np.indices(template_image.shape, dtype=float)
    radii = np.asarray([zradius, xradius, yradius], dtype=float)
    radii_bcast = np.broadcast_to(radii[:, None, None, None], shape=indices.shape)
    tree = KDTree(centers_of_mass)
    for i, point in enumerate(centers_of_mass, start=1):
        mask = np.sum(((indices - point[:,None,None,None]) / radii_bcast) ** 2, axis=0) <= 1
        # check for overlap
        if np.any(mask & out):
            # get the overlapping points before modifying out
            overlap_mask = mask & out.astype(bool)
            overlap_idx = np.array(np.where(overlap_mask)).T
            out[mask] = i
            # get the closest center for all overlapping points
            out[overlap_mask] = tree.query(overlap_idx)[1] + 1
        else:
            out[mask] = i
    return out

这大约是之前方法的两倍 (20.2ms ± 384µs),另外一个优点是椭圆体根据它们的索引进行标记并且相邻的椭圆体不会合并在一起(这是一个问题分水岭)。

在我的机器上,它在 4.38s ± 43.9ms(31x512x512、40 个椭圆体)上运行全尺寸示例数据。

无论是算法还是实现,都确实有一些改进的余地。 @Eric Johnson 已经介绍了后者,现在请允许我通过使用更好的算法来演示该示例的另一个 20 倍加速。

改进:1.Before 屏蔽限制为易于计算的边界框。 2. 对于重叠分辨率,循环使用已经为椭球计算完成的距离计算。

代码(假设 Eric 的函数已经定义):

import numpy as np

def rasterise(template_image, centers_of_mass, xradius=4.5, yradius=4.5, zradius=3.5):
    """Creates a labeled N-dimensional numpy array of ellipsoids.

    :param template_image: An N-dimensional numpy array of the same shape as the output array.
    :param centers_of_mass: A list of lists of N floats defining the centers of spots.
    :param zradius: A float defining the radius in pixels in the z direction of the ellipsoids.
    :param xradius: A float defining the radius in pixels in the x direction of the ellipsoids.
    :param yradius: A float defining the radius in pixels in the y direction of the ellipsoids.
    :return: An N-dimensional numpy array, with label `n` for the ellipsoid at index `n-1`.
    """
    sh = template_image.shape
    out = np.zeros(sh,int)
    aux = np.zeros(sh)
    radii = np.array([zradius,xradius,yradius])
    for j,com in enumerate(centers_of_mass,1):
        bboxl = np.floor(com-radii).clip(0,None).astype(int)
        bboxh = (np.ceil(com+radii)+1).clip(None,sh).astype(int)
        roi = out[tuple(map(slice,bboxl,bboxh))]
        roiaux = aux[tuple(map(slice,bboxl,bboxh))]
        logrid = *map(np.square,np.ogrid[tuple(
            map(slice,(bboxl-com)/radii,(bboxh-com-1)/radii,1j*(bboxh-bboxl)))]),
        dst = (1-sum(logrid)).clip(0,None)
        mask = dst>roiaux
        roi[mask] = j
        np.copyto(roiaux,dst,where=mask)
    return out
        

    
    
zdim, xdim, ydim = 15, 100, 100
example_shape = np.zeros((zdim,xdim,ydim))
example_points = np.random.random_sample(size=(10,3))*np.array([zdim,xdim,ydim])
center_spots_image = define_centromeres_labeled(example_shape, example_points)
csi = rasterise(example_shape, example_points)
print("number of pixels dfferent",np.count_nonzero(csi != center_spots_image),"out of",csi.size)
from timeit import timeit
print("Eric",timeit(lambda:define_centromeres_labeled(example_shape, example_points),number=10))
print("loopy",timeit(lambda:rasterise(example_shape, example_points),number=10))

样本运行:

number of pixels dfferent 0 out of 150000
Eric 0.37984768400201574
loopy 0.019632569048553705

警告:

Eric 的代码和我的代码之间的重叠分辨率略有不同。例如:

不同之处在于(我认为)埃里克(上图)使用标准欧几里德度量,而我(下图)使用椭圆体建议的度量主要是出于机会主义,但也因为它甚至可能是正确的做。切换它是可能的,但会降低速度。