使用大型数组的 NumPy 的 3D 高斯有效总和
Efficient sum of Gaussians in 3D with NumPy using large arrays
我有一个 M x 3 的 3D 坐标数组,coords (M ~1000-10000),我想计算以这些坐标为中心的高斯总和网格 3D 阵列。网格 3D 阵列通常为 64 x 64 x 64,但有时会超过 256 x 256 x 256,甚至可以更大。我按照 this question 开始,将我的 meshgrid 数组转换为 N x 3 坐标数组,xyz,其中 N 是 64^3 或 256^3,等。但是,对于大数组大小,它需要太多内存来矢量化整个计算(可以理解,因为它可能接近 1e11 元素并消耗 1 TB 的 RAM)所以我将它分解为 M 坐标上的循环。但是,这太慢了。
我想知道是否有任何方法可以在不超载内存的情况下加快速度。通过将 meshgrid 转换为 xyz,我觉得我已经失去了网格等间距的任何优势,并且不知何故,也许 scipy.ndimage,我应该能够利用均匀间距来加快速度.
这是我最初的开始:
import numpy as np
from scipy import spatial
#create meshgrid
side = 100.
n = 64 #could be 256 or larger
x_ = np.linspace(-side/2,side/2,n)
x,y,z = np.meshgrid(x_,x_,x_,indexing='ij')
#convert meshgrid to list of coordinates
xyz = np.column_stack((x.ravel(),y.ravel(),z.ravel()))
#create some coordinates
coords = np.random.random(size=(1000,3))*side - side/2
def sumofgauss(coords,xyz,sigma):
"""Simple isotropic gaussian sum at coordinate locations."""
n = int(round(xyz.shape[0]**(1/3.))) #get n samples for reshaping to 3D later
#this version overloads memory
#dist = spatial.distance.cdist(coords, xyz)
#dist *= dist
#values = 1./np.sqrt(2*np.pi*sigma**2) * np.exp(-dist/(2*sigma**2))
#values = np.sum(values,axis=0)
#run cdist in a loop over coords to avoid overloading memory
values = np.zeros((xyz.shape[0]))
for i in range(coords.shape[0]):
dist = spatial.distance.cdist(coords[None,i], xyz)
dist *= dist
values += 1./np.sqrt(2*np.pi*sigma**2) * np.exp(-dist[0]/(2*sigma**2))
return values.reshape(n,n,n)
image = sumofgauss(coords,xyz,1.0)
import matplotlib.pyplot as plt
plt.imshow(image[n/2]) #show a slice
plt.show()
M = 1000,N = 64(~5 秒):
M = 1000,N = 256(~10 分钟):
考虑到您的许多距离计算在指数之后会给出零权重,您可能可以减少很多距离。使用 KDTree
:
进行大块距离计算,同时丢弃大于阈值的距离通常更快
import numpy as np
from scipy.spatial import cKDTree # so we can get a `coo_matrix` output
def gaussgrid(coords, sigma = 1, n = 64, side = 100, eps = None):
x_ = np.linspace(-side/2,side/2,n)
x,y,z = np.meshgrid(x_,x_,x_,indexing='ij')
xyz = np.column_stack((x.ravel(),y.ravel(),z.ravel()))
if eps is None:
eps = np.finfo('float64').eps
thr = -np.log(eps) * 2 * sigma**2
data_tree = cKDTree(coords)
discr = 1000 # you can tweak this to get best results on your system
values = np.empty(n**3)
for i in range(n**3//discr + 1):
slc = slice(i * discr, i * discr + discr)
grid_tree = cKDTree(xyz[slc])
dists = grid_tree.sparse_distance_matrix(data_tree, thr, output_type = 'coo_matrix')
dists.data = 1./np.sqrt(2*np.pi*sigma**2) * np.exp(-dists.data/(2*sigma**2))
values[slc] = dists.sum(1).squeeze()
return values.reshape(n,n,n)
现在,即使你保持 eps = None
它会快一点,因为你仍然返回大约 10% 的距离,但是 eps = 1e-6 左右,你应该得到一个大的加速。在我的系统上:
%timeit out = sumofgauss(coords, xyz, 1.0)
1 loop, best of 3: 23.7 s per loop
%timeit out = gaussgrid(coords)
1 loop, best of 3: 2.12 s per loop
%timeit out = gaussgrid(coords, eps = 1e-6)
1 loop, best of 3: 382 ms per loop
我有一个 M x 3 的 3D 坐标数组,coords (M ~1000-10000),我想计算以这些坐标为中心的高斯总和网格 3D 阵列。网格 3D 阵列通常为 64 x 64 x 64,但有时会超过 256 x 256 x 256,甚至可以更大。我按照 this question 开始,将我的 meshgrid 数组转换为 N x 3 坐标数组,xyz,其中 N 是 64^3 或 256^3,等。但是,对于大数组大小,它需要太多内存来矢量化整个计算(可以理解,因为它可能接近 1e11 元素并消耗 1 TB 的 RAM)所以我将它分解为 M 坐标上的循环。但是,这太慢了。
我想知道是否有任何方法可以在不超载内存的情况下加快速度。通过将 meshgrid 转换为 xyz,我觉得我已经失去了网格等间距的任何优势,并且不知何故,也许 scipy.ndimage,我应该能够利用均匀间距来加快速度.
这是我最初的开始:
import numpy as np
from scipy import spatial
#create meshgrid
side = 100.
n = 64 #could be 256 or larger
x_ = np.linspace(-side/2,side/2,n)
x,y,z = np.meshgrid(x_,x_,x_,indexing='ij')
#convert meshgrid to list of coordinates
xyz = np.column_stack((x.ravel(),y.ravel(),z.ravel()))
#create some coordinates
coords = np.random.random(size=(1000,3))*side - side/2
def sumofgauss(coords,xyz,sigma):
"""Simple isotropic gaussian sum at coordinate locations."""
n = int(round(xyz.shape[0]**(1/3.))) #get n samples for reshaping to 3D later
#this version overloads memory
#dist = spatial.distance.cdist(coords, xyz)
#dist *= dist
#values = 1./np.sqrt(2*np.pi*sigma**2) * np.exp(-dist/(2*sigma**2))
#values = np.sum(values,axis=0)
#run cdist in a loop over coords to avoid overloading memory
values = np.zeros((xyz.shape[0]))
for i in range(coords.shape[0]):
dist = spatial.distance.cdist(coords[None,i], xyz)
dist *= dist
values += 1./np.sqrt(2*np.pi*sigma**2) * np.exp(-dist[0]/(2*sigma**2))
return values.reshape(n,n,n)
image = sumofgauss(coords,xyz,1.0)
import matplotlib.pyplot as plt
plt.imshow(image[n/2]) #show a slice
plt.show()
M = 1000,N = 64(~5 秒):
M = 1000,N = 256(~10 分钟):
考虑到您的许多距离计算在指数之后会给出零权重,您可能可以减少很多距离。使用 KDTree
:
import numpy as np
from scipy.spatial import cKDTree # so we can get a `coo_matrix` output
def gaussgrid(coords, sigma = 1, n = 64, side = 100, eps = None):
x_ = np.linspace(-side/2,side/2,n)
x,y,z = np.meshgrid(x_,x_,x_,indexing='ij')
xyz = np.column_stack((x.ravel(),y.ravel(),z.ravel()))
if eps is None:
eps = np.finfo('float64').eps
thr = -np.log(eps) * 2 * sigma**2
data_tree = cKDTree(coords)
discr = 1000 # you can tweak this to get best results on your system
values = np.empty(n**3)
for i in range(n**3//discr + 1):
slc = slice(i * discr, i * discr + discr)
grid_tree = cKDTree(xyz[slc])
dists = grid_tree.sparse_distance_matrix(data_tree, thr, output_type = 'coo_matrix')
dists.data = 1./np.sqrt(2*np.pi*sigma**2) * np.exp(-dists.data/(2*sigma**2))
values[slc] = dists.sum(1).squeeze()
return values.reshape(n,n,n)
现在,即使你保持 eps = None
它会快一点,因为你仍然返回大约 10% 的距离,但是 eps = 1e-6 左右,你应该得到一个大的加速。在我的系统上:
%timeit out = sumofgauss(coords, xyz, 1.0)
1 loop, best of 3: 23.7 s per loop
%timeit out = gaussgrid(coords)
1 loop, best of 3: 2.12 s per loop
%timeit out = gaussgrid(coords, eps = 1e-6)
1 loop, best of 3: 382 ms per loop