具有可变标准偏差内核的卷积数组
Convolve array with kernel of variable standard deviation
祝你程序员好!
今天我想做一些我认为很棘手的事情。我有一个名为 tac
的非常大的二维数组,它基本上包含时间曲线值和一个包含名为 coor
的坐标元组的文件,该文件包含有关将这些曲线放置在 3D 数组中何处的信息。这组变量代表的其实是一个4维数组:前3维代表space维,第四维代表时间。整个事情都按原样存储,以避免存储大量的零。
我想对这组数据每次(换句话说,第 4 维中的每个值)应用高斯核。我能够生成这个内核并使用 scipy.ndimage.convolve
很容易地为整个数组执行卷积以获得固定的标准偏差。内核是使用 scipy.signal.gaussian
创建的。这里简单举例说明tac_4d
包含4维数组的原理(存储了很多我知道的数据……但当时有一个问题):
def gaussian_kernel_3d(radius, sigma):
num = 2 * radius + 1
kernel_1d = signal.gaussian(num, std=sigma).reshape(num, 1)
kernel_2d = np.outer(kernel_1d, kernel_1d)
kernel_3d = np.outer(kernel_1d, kernel_2d).reshape(num, num, num)
kernel_3d = np.expand_dims(kernel_3d, -1)
return kernel_3d
g = gaussian_kernel_3d(1, .5)
cag = nd.convolve(tac_4d, g, mode='constant', cval=0.0)
现在的诀窍是用内核对数组进行卷积,每个 SPACE 坐标的标准差都不同。换句话说,我会有一个 3D 数组 std
,其中包含数组每个坐标的标准差。
似乎 https://github.com/sheliak/varconvolve 是解决此问题所需的代码。但是我真的不明白如何使用它,坦率地说,我更愿意提出一个真正的解决方案。你们有没有办法解决这个问题?
提前致谢!
编辑
下面是我希望可以考虑的MCVE
import numpy as np
from scipy import signal
from scipy import ndimage as nd
def gaussian_kernel_2d(radius, sigma):
num = 2 * radius + 1
kernel_1d = signal.gaussian(num, std=sigma).reshape(num, 1)
kernel_2d = np.outer(kernel_1d, kernel_1d)
return kernel_2d
def gaussian_kernel_3d(radius, sigma):
num = 2 * radius + 1
kernel_1d = signal.gaussian(num, std=sigma).reshape(num, 1)
kernel_2d = np.outer(kernel_1d, kernel_1d)
kernel_3d = np.outer(kernel_1d, kernel_2d).reshape(num, num, num)
kernel_3d = np.expand_dims(kernel_3d, -1)
return kernel_3d
np.random.seed(0)
number_of_tac = 150
time_samples = 915
z, y, x = 100, 150, 100
voxel_number = x * y * z
# TACs in the right order
tac = np.random.uniform(0, 4, time_samples * number_of_tac).reshape(number_of_tac, time_samples)
arr = np.array([0] * (voxel_number - number_of_tac) + [1] * number_of_tac)
np.random.shuffle(arr)
arr = arr.reshape(z, y, x)
coor = np.where(arr != 0) # non-empty voxel
# Algorithm to replace TAC in 3D space
nnz = np.zeros(arr.shape)
nnz[coor] = 1
tac_4d = np.zeros((x, y, z, time_samples))
tac_4d[np.where(nnz == 1)] = tac
# 3D convolution for all time
# TODO: find a way to make standard deviation change for each voxel
g = gaussian_kernel_3d(1, 1) # 3D kernel of std = 1
v = np.random.uniform(0, 1, x * y * z).reshape(z, y, x) # 3D array of std
cag = nd.convolve(tac_4d, g, mode='constant', cval=0.0) # convolution
本质上,您有一个 4D 数据集,形状 (nx, ny, nz, nt)
在 (nx, ny, nz)
轴上稀疏,在 nt
轴上密集。如果 (i, j, k)
是稀疏维度中非零点的坐标,您希望与具有依赖于 (i, j, k)
.
的 sigma 的高斯 3D 内核进行卷积
例如,如果在 [1, 2, 5]
和 [1, 4, 5]
处有非零点,对应的 sigmas 0.1
和 1.0
,则坐标 [1, 3, 5]
处的输出主要受 [1, 4, 5]
点的影响,因为该点的点差最大。
你的问题有歧义;这也可能意味着点 [1, 3, 5]
有自己的相关西格玛,例如 0.5,并从两个相邻点以相同的权重提取数据。我将采用第一个定义(与输入点相关联的西格玛值,而不是与输出点相关联)。
因为运算不是真正的卷积,所以没有快速的基于 FFT 的方法在一次运算中完成整个运算。相反,您必须遍历 sigma 值。幸运的是,您的示例只有 150 个非零点,因此循环并不太昂贵。
这是一个实现。我尽可能长时间地以稀疏表示形式保存数据。
import scipy.signal
import numpy as np
def kernel3d(mm, sigma):
"""Return (mm, mm, mm) shaped, normalized kernel."""
g1 = scipy.signal.gaussian(mm, std=sigma)
g3 = g1.reshape(mm, 1, 1) * g1.reshape(1, mm, 1) * g1.reshape(1, 1, mm)
return g3 * (1/g3.sum())
np.random.seed(1)
s = 2 # scaling factor (original problem: s=10)
nx, ny, nz, nt, nnz = 10*s, 11*s, 12*s, 91*s, 15*s
# select nnz random voxels to fill with time series data
randint = np.random.randint
tseries = {} # key: (i, j, k) tuple; value: time series data, shape (nt,)
for _ in range(nnz):
while True:
ijk = (randint(nx), randint(ny), randint(nz))
if ijk not in tseries:
tseries[ijk] = np.random.uniform(0, 1, size=nt)
break
ijks = np.array(list(tseries.keys())) # shape (nnz, 3)
# sigmas: key: (i, j, k) tuple; value: standard deviation
sigmas = { k: np.random.uniform(0, 2) for k in tseries.keys() }
# output will be stored as dense array, padded to avoid edge issues
# with convolution.
m = 5 # padding size
cag_4dp = np.zeros((nx+2*m, ny+2*m, nz+2*m, nt))
mm = 2*m + 1 # kernel width
for (i, j, k), tdata in tseries.items():
kernel = kernel3d(mm, sigmas[(i, j, k)]).reshape(mm, mm, mm, 1)
# convolution of one voxel by kernel is trivial.
# slice4d_c has shape (mm, mm, mm, nt).
slice4d_c = kernel * tdata
cag_4dp[i:i+mm, j:j+mm, k:k+mm, :] += slice4d_c
cag_4d = cag_4dp[m:-m, m:-m, m:-m, :]
#%%
import matplotlib.pyplot as plt
fig, axs = plt.subplots(2, 2, tight_layout=True)
plt.close('all')
# find a few planes
#ks = np.where(np.any(cag_4d != 0, axis=(0, 1,3)))[0]
ks = ijks[:4, 2]
for ax, k in zip(axs.ravel(), ks):
ax.imshow(cag_4d[:, :, k, nt//2].T)
ax.set_title(f'Voxel [:, :, {k}] at time {nt//2}')
fig.show()
for ijk, sigma in sigmas.items():
print(f'{ijk}: sigma={sigma:.2f}')
祝你程序员好!
今天我想做一些我认为很棘手的事情。我有一个名为 tac
的非常大的二维数组,它基本上包含时间曲线值和一个包含名为 coor
的坐标元组的文件,该文件包含有关将这些曲线放置在 3D 数组中何处的信息。这组变量代表的其实是一个4维数组:前3维代表space维,第四维代表时间。整个事情都按原样存储,以避免存储大量的零。
我想对这组数据每次(换句话说,第 4 维中的每个值)应用高斯核。我能够生成这个内核并使用 scipy.ndimage.convolve
很容易地为整个数组执行卷积以获得固定的标准偏差。内核是使用 scipy.signal.gaussian
创建的。这里简单举例说明tac_4d
包含4维数组的原理(存储了很多我知道的数据……但当时有一个问题):
def gaussian_kernel_3d(radius, sigma):
num = 2 * radius + 1
kernel_1d = signal.gaussian(num, std=sigma).reshape(num, 1)
kernel_2d = np.outer(kernel_1d, kernel_1d)
kernel_3d = np.outer(kernel_1d, kernel_2d).reshape(num, num, num)
kernel_3d = np.expand_dims(kernel_3d, -1)
return kernel_3d
g = gaussian_kernel_3d(1, .5)
cag = nd.convolve(tac_4d, g, mode='constant', cval=0.0)
现在的诀窍是用内核对数组进行卷积,每个 SPACE 坐标的标准差都不同。换句话说,我会有一个 3D 数组 std
,其中包含数组每个坐标的标准差。
似乎 https://github.com/sheliak/varconvolve 是解决此问题所需的代码。但是我真的不明白如何使用它,坦率地说,我更愿意提出一个真正的解决方案。你们有没有办法解决这个问题?
提前致谢!
编辑
下面是我希望可以考虑的MCVE
import numpy as np
from scipy import signal
from scipy import ndimage as nd
def gaussian_kernel_2d(radius, sigma):
num = 2 * radius + 1
kernel_1d = signal.gaussian(num, std=sigma).reshape(num, 1)
kernel_2d = np.outer(kernel_1d, kernel_1d)
return kernel_2d
def gaussian_kernel_3d(radius, sigma):
num = 2 * radius + 1
kernel_1d = signal.gaussian(num, std=sigma).reshape(num, 1)
kernel_2d = np.outer(kernel_1d, kernel_1d)
kernel_3d = np.outer(kernel_1d, kernel_2d).reshape(num, num, num)
kernel_3d = np.expand_dims(kernel_3d, -1)
return kernel_3d
np.random.seed(0)
number_of_tac = 150
time_samples = 915
z, y, x = 100, 150, 100
voxel_number = x * y * z
# TACs in the right order
tac = np.random.uniform(0, 4, time_samples * number_of_tac).reshape(number_of_tac, time_samples)
arr = np.array([0] * (voxel_number - number_of_tac) + [1] * number_of_tac)
np.random.shuffle(arr)
arr = arr.reshape(z, y, x)
coor = np.where(arr != 0) # non-empty voxel
# Algorithm to replace TAC in 3D space
nnz = np.zeros(arr.shape)
nnz[coor] = 1
tac_4d = np.zeros((x, y, z, time_samples))
tac_4d[np.where(nnz == 1)] = tac
# 3D convolution for all time
# TODO: find a way to make standard deviation change for each voxel
g = gaussian_kernel_3d(1, 1) # 3D kernel of std = 1
v = np.random.uniform(0, 1, x * y * z).reshape(z, y, x) # 3D array of std
cag = nd.convolve(tac_4d, g, mode='constant', cval=0.0) # convolution
本质上,您有一个 4D 数据集,形状 (nx, ny, nz, nt)
在 (nx, ny, nz)
轴上稀疏,在 nt
轴上密集。如果 (i, j, k)
是稀疏维度中非零点的坐标,您希望与具有依赖于 (i, j, k)
.
例如,如果在 [1, 2, 5]
和 [1, 4, 5]
处有非零点,对应的 sigmas 0.1
和 1.0
,则坐标 [1, 3, 5]
处的输出主要受 [1, 4, 5]
点的影响,因为该点的点差最大。
你的问题有歧义;这也可能意味着点 [1, 3, 5]
有自己的相关西格玛,例如 0.5,并从两个相邻点以相同的权重提取数据。我将采用第一个定义(与输入点相关联的西格玛值,而不是与输出点相关联)。
因为运算不是真正的卷积,所以没有快速的基于 FFT 的方法在一次运算中完成整个运算。相反,您必须遍历 sigma 值。幸运的是,您的示例只有 150 个非零点,因此循环并不太昂贵。
这是一个实现。我尽可能长时间地以稀疏表示形式保存数据。
import scipy.signal
import numpy as np
def kernel3d(mm, sigma):
"""Return (mm, mm, mm) shaped, normalized kernel."""
g1 = scipy.signal.gaussian(mm, std=sigma)
g3 = g1.reshape(mm, 1, 1) * g1.reshape(1, mm, 1) * g1.reshape(1, 1, mm)
return g3 * (1/g3.sum())
np.random.seed(1)
s = 2 # scaling factor (original problem: s=10)
nx, ny, nz, nt, nnz = 10*s, 11*s, 12*s, 91*s, 15*s
# select nnz random voxels to fill with time series data
randint = np.random.randint
tseries = {} # key: (i, j, k) tuple; value: time series data, shape (nt,)
for _ in range(nnz):
while True:
ijk = (randint(nx), randint(ny), randint(nz))
if ijk not in tseries:
tseries[ijk] = np.random.uniform(0, 1, size=nt)
break
ijks = np.array(list(tseries.keys())) # shape (nnz, 3)
# sigmas: key: (i, j, k) tuple; value: standard deviation
sigmas = { k: np.random.uniform(0, 2) for k in tseries.keys() }
# output will be stored as dense array, padded to avoid edge issues
# with convolution.
m = 5 # padding size
cag_4dp = np.zeros((nx+2*m, ny+2*m, nz+2*m, nt))
mm = 2*m + 1 # kernel width
for (i, j, k), tdata in tseries.items():
kernel = kernel3d(mm, sigmas[(i, j, k)]).reshape(mm, mm, mm, 1)
# convolution of one voxel by kernel is trivial.
# slice4d_c has shape (mm, mm, mm, nt).
slice4d_c = kernel * tdata
cag_4dp[i:i+mm, j:j+mm, k:k+mm, :] += slice4d_c
cag_4d = cag_4dp[m:-m, m:-m, m:-m, :]
#%%
import matplotlib.pyplot as plt
fig, axs = plt.subplots(2, 2, tight_layout=True)
plt.close('all')
# find a few planes
#ks = np.where(np.any(cag_4d != 0, axis=(0, 1,3)))[0]
ks = ijks[:4, 2]
for ax, k in zip(axs.ravel(), ks):
ax.imshow(cag_4d[:, :, k, nt//2].T)
ax.set_title(f'Voxel [:, :, {k}] at time {nt//2}')
fig.show()
for ijk, sigma in sigmas.items():
print(f'{ijk}: sigma={sigma:.2f}')