去噪图像分割(模式过滤?) - 或者如何在 numpy 中矢量化这个操作?
denoise image segmentation (mode filtering?) - or how to vectorize this operation in numpy?
更新:在我最初的 post 中,我愚蠢地应用 stats.mode
补丁而不是沿着补丁的轴。解决这个问题使我的速度提高了 4 倍。但是它仍然很慢,我最初的问题仍然存在:(1) 我可以提高速度吗? (2) 是否有 different/better/standard 方法来清理嘈杂的分类数据?返回 post:
我有一些有噪声的图像分割结果,我想清理它。我的想法是采用 (3,3) 补丁的模式值。此代码有效,但速度太慢。:
from sklearn.feature_extraction import image
from scipy import stats
def _mode(a,axis=None):
m,_=stats.mode(a,axis=axis)
return m
def mode_smoothing(data,kernel=(3,3)):
patches=image.extract_patches_2d(data,kernel)
nb_patches=patches.shape[0]
patches=patches.reshape(nb_patches,-1)
return _mode(patches,1).reshape(int(np.sqrt(nb_patches)),-1)
""" original method (new version is ~ 5 times faster, but still slow)
def _mode(arr):
m,_=stats.mode(arr,axis=None)
return m
def mode_smoothing(data,kernel=(3,3)):
patches=image.extract_patches_2d(data,kernel)
nb_patches=patches.shape[0]
w=int(np.sqrt(nb_patches))
o=np.array([_mode(patches[p]) for p in range(nb_patches)])
return o.reshape(w,-1)
"""
问题:
- 有没有更快的方法来做到这一点?在 numpy 中消除 loop/vectorize?直接移植到 c 还是使用 numba 等?我努力让一些东西沿着这些路径工作
- 是否有更好/更标准的方法来对分类图像数据完成这样的去噪?
这是来自上述 mode_smoothing
方法的 before/after 示例
下面我给出了 2 个答案来回答我的问题:
- 通过将我最初的尝试扩展为 numba 可以处理的函数
- 使用 Alex Alex 的上述建议,我称之为 "categorical-smoothing"(此方法有标准名称吗?)
我还没有写出数学证明,但看起来这种 patch-wise 模式平滑等同于正确选择参数的分类平滑。两者都会带来很大的速度提升,但分类平滑解决方案更干净、更快,并且不涉及 numba - 所以它赢了。
NUMBA
@njit
def mode_smoothing(data,kernel=(3,3),step=(1,1),edges=False,high_value=False,center_boost=False):
""" mode smoothing over patches
Args:
data<np.array>: numpy array (ndim=2)
kernel<tuple[int]>: (height,width) of patch window
step<tuple[int]>: (y-step,x-step)
edges<bool>:
- if true
* include edge patches by taking mode over smaller patch window
* the returned image be the same shape as the input data
- if false
* only run over patches with the full kernel size
* the returned image will be reduced in size by the radius of the kernel
high_value<bool>:
when there are multiple possible mode values choose the highest if true,
otherwise choose the lowest value
center_boost<int|bool>:
if true, instead of using pure mode-value increase the count on the center pixel
Return
<np.array> of patch wise mode values. shape my be different than input. see `edges` above
"""
h,w=data.shape
ry=int(kernel[0]//2)
rx=int(kernel[1]//2)
sy,sx=step
_mode_vals=[]
if edges:
j0,j1=0,h
i0,i1=0,w
else:
j0,j1=ry,h-ry
i0,i1=rx,w-rx
for j in range(j0,j1,sy):
for i in range(i0,i1,sx):
ap=data[
max(j-ry,0):j+ry+1,
max(i-rx,0):i+rx+1]
cv=data[j,i]
values=np.unique(ap)
count=0
for v in values:
newcount=(ap==v).sum()
if center_boost and (v==cv):
newcount+=center_boost
if high_value:
test=newcount>=count
else:
test=newcount>count
if test:
count=newcount
mode_value=v
_mode_vals.append(mode_value)
return np.array(_mode_vals).reshape(j1-j0,i1-i0)
分类平滑
from scipy.signal import convolve2d
KERNEL=np.ones((3,3))
def categorical_smoothing(data,nb_categories,kernel=KERNEL):
data=np.eye(nb_categories)[:,data]
for i in range(nb_categories):
data[i]=convolve2d(data[i],kernel,mode='same')
return data.argmax(axis=0)
EQUIVALENCE/SPEED-CHECK
这可能很容易证明,但是...
S=512
N=19
data=np.random.randint(0,N,(S,S))
%time o1=mode_smoothing(data,edges=True,center_boost=False)
kernel=np.ones((3,3))
%time o2=categorical_smoothing(data,N,kernel=kernel)
print((o1==o2).all())
print()
data=np.random.randint(0,N,(S,S))
%time o1=mode_smoothing(data,edges=True,center_boost=1)
kernel=np.ones((3,3))
kernel[1,1]=2
%time o2=categorical_smoothing(data,N,kernel=kernel)
print((o1==o2).all())
""" OUTPUT
CPU times: user 826 ms, sys: 0 ns, total: 826 ms
Wall time: 825 ms
CPU times: user 416 ms, sys: 7.83 ms, total: 423 ms
Wall time: 423 ms
True
CPU times: user 825 ms, sys: 3.78 ms, total: 829 ms
Wall time: 828 ms
CPU times: user 422 ms, sys: 3.91 ms, total: 426 ms
Wall time: 425 ms
True
"""
更新:在我最初的 post 中,我愚蠢地应用 stats.mode
补丁而不是沿着补丁的轴。解决这个问题使我的速度提高了 4 倍。但是它仍然很慢,我最初的问题仍然存在:(1) 我可以提高速度吗? (2) 是否有 different/better/standard 方法来清理嘈杂的分类数据?返回 post:
我有一些有噪声的图像分割结果,我想清理它。我的想法是采用 (3,3) 补丁的模式值。此代码有效,但速度太慢。:
from sklearn.feature_extraction import image
from scipy import stats
def _mode(a,axis=None):
m,_=stats.mode(a,axis=axis)
return m
def mode_smoothing(data,kernel=(3,3)):
patches=image.extract_patches_2d(data,kernel)
nb_patches=patches.shape[0]
patches=patches.reshape(nb_patches,-1)
return _mode(patches,1).reshape(int(np.sqrt(nb_patches)),-1)
""" original method (new version is ~ 5 times faster, but still slow)
def _mode(arr):
m,_=stats.mode(arr,axis=None)
return m
def mode_smoothing(data,kernel=(3,3)):
patches=image.extract_patches_2d(data,kernel)
nb_patches=patches.shape[0]
w=int(np.sqrt(nb_patches))
o=np.array([_mode(patches[p]) for p in range(nb_patches)])
return o.reshape(w,-1)
"""
问题:
- 有没有更快的方法来做到这一点?在 numpy 中消除 loop/vectorize?直接移植到 c 还是使用 numba 等?我努力让一些东西沿着这些路径工作
- 是否有更好/更标准的方法来对分类图像数据完成这样的去噪?
这是来自上述 mode_smoothing
方法的 before/after 示例
下面我给出了 2 个答案来回答我的问题:
- 通过将我最初的尝试扩展为 numba 可以处理的函数
- 使用 Alex Alex 的上述建议,我称之为 "categorical-smoothing"(此方法有标准名称吗?)
我还没有写出数学证明,但看起来这种 patch-wise 模式平滑等同于正确选择参数的分类平滑。两者都会带来很大的速度提升,但分类平滑解决方案更干净、更快,并且不涉及 numba - 所以它赢了。
NUMBA
@njit
def mode_smoothing(data,kernel=(3,3),step=(1,1),edges=False,high_value=False,center_boost=False):
""" mode smoothing over patches
Args:
data<np.array>: numpy array (ndim=2)
kernel<tuple[int]>: (height,width) of patch window
step<tuple[int]>: (y-step,x-step)
edges<bool>:
- if true
* include edge patches by taking mode over smaller patch window
* the returned image be the same shape as the input data
- if false
* only run over patches with the full kernel size
* the returned image will be reduced in size by the radius of the kernel
high_value<bool>:
when there are multiple possible mode values choose the highest if true,
otherwise choose the lowest value
center_boost<int|bool>:
if true, instead of using pure mode-value increase the count on the center pixel
Return
<np.array> of patch wise mode values. shape my be different than input. see `edges` above
"""
h,w=data.shape
ry=int(kernel[0]//2)
rx=int(kernel[1]//2)
sy,sx=step
_mode_vals=[]
if edges:
j0,j1=0,h
i0,i1=0,w
else:
j0,j1=ry,h-ry
i0,i1=rx,w-rx
for j in range(j0,j1,sy):
for i in range(i0,i1,sx):
ap=data[
max(j-ry,0):j+ry+1,
max(i-rx,0):i+rx+1]
cv=data[j,i]
values=np.unique(ap)
count=0
for v in values:
newcount=(ap==v).sum()
if center_boost and (v==cv):
newcount+=center_boost
if high_value:
test=newcount>=count
else:
test=newcount>count
if test:
count=newcount
mode_value=v
_mode_vals.append(mode_value)
return np.array(_mode_vals).reshape(j1-j0,i1-i0)
分类平滑
from scipy.signal import convolve2d
KERNEL=np.ones((3,3))
def categorical_smoothing(data,nb_categories,kernel=KERNEL):
data=np.eye(nb_categories)[:,data]
for i in range(nb_categories):
data[i]=convolve2d(data[i],kernel,mode='same')
return data.argmax(axis=0)
EQUIVALENCE/SPEED-CHECK
这可能很容易证明,但是...
S=512
N=19
data=np.random.randint(0,N,(S,S))
%time o1=mode_smoothing(data,edges=True,center_boost=False)
kernel=np.ones((3,3))
%time o2=categorical_smoothing(data,N,kernel=kernel)
print((o1==o2).all())
print()
data=np.random.randint(0,N,(S,S))
%time o1=mode_smoothing(data,edges=True,center_boost=1)
kernel=np.ones((3,3))
kernel[1,1]=2
%time o2=categorical_smoothing(data,N,kernel=kernel)
print((o1==o2).all())
""" OUTPUT
CPU times: user 826 ms, sys: 0 ns, total: 826 ms
Wall time: 825 ms
CPU times: user 416 ms, sys: 7.83 ms, total: 423 ms
Wall time: 423 ms
True
CPU times: user 825 ms, sys: 3.78 ms, total: 829 ms
Wall time: 828 ms
CPU times: user 422 ms, sys: 3.91 ms, total: 426 ms
Wall time: 425 ms
True
"""