二维中值滤波器，忽略 nan 值

Question

作为我项目的一部分，我需要使用对 rxr window 执行中值滤波并忽略 nan 值的代码。

我目前使用 MATLAB 的 nlfilter 函数。问题是它非常慢： 300x300 的示例需要将近 5 秒，而 MATLAB 的 medfilt2 需要 0.2 秒。谁有更高效更优雅的解决方案？

注意：边界上的行为对我来说并不重要。在此示例中，nlfilter 自动用零填充数组，但其他解决方案（例如边界复制）也可以。

代码示例：

%Initialize input
r = 3; % window size is 3x3
I = [9,1,6,10,1,5,4;2,4,3,8,8,NaN,5;4,5,8,6,2,NaN,3;5,NaN,6,4,NaN,4,9;3,1,10,9,4,3,2;10,9,10,10,6,NaN,5;10,9,4,1,2,7,2];

%perform median filter on rxr window, igonre nans
f = @(A)median(A(~isnan(A)));
filteredRes = nlfilter(I, [r r], f);
filteredRes(nanMask) = nan;

预期结果

过滤前：

I =
 9     1     6    10     1     5     4
 2     4     3     8     8   NaN     5
 4     5     8     6     2   NaN     3
 5   NaN     6     4   NaN     4     9
 3     1    10     9     4     3     2
10     9    10    10     6   NaN     5
10     9     4     1     2     7     2

过滤后：

filteredRes =
     0    2.0000    3.0000    3.0000    3.0000    2.5000         0
2.0000    4.0000    6.0000    6.0000    6.0000       NaN    3.0000
3.0000    4.5000    5.5000    6.0000    5.0000       NaN    3.0000
2.0000       NaN    6.0000    6.0000       NaN    3.0000    2.5000
2.0000    7.5000    9.0000    7.5000    4.0000    4.0000    2.5000
3.0000    9.0000    9.0000    6.0000    5.0000       NaN    2.0000
     0    9.0000    4.0000    2.0000    1.5000    2.0000         0

谢谢！

Answer 1

您可以先使用 padarray where you want to pad floor(r/2) pixels on each side, then use im2col 填充图像以重构填充后的图像，以便将每个像素邻域放置在单独的列中。接下来，您需要先将所有 nan 值设置为一个虚拟值，这样您就不会干扰中位数计算……也许是零。之后，找到每列的中位数，然后重新整形为适当大小的图像。

像这样的东西应该可以工作：

r = 3;
nanMask = isnan(I); % Define nan mask
Ic = I;
Ic(nanMask) = 0; % Create new copy of image and set nan elements to zero
IP = padarray(Ic, floor([r/2 r/2]), 'both'); % Pad image
IPc = im2col(IP, [r r], 'sliding'); % Transform into columns
out = reshape(median(IPc, 1), size(I,1), size(I,2)); % Find median of each column and reshape back
out(nanMask) = nan; % Set nan elements back

我们得到：

>> out

out =

     0     2     3     3     1     1     0
     2     4     6     6     5   NaN     0
     2     4     5     6     4   NaN     0
     1   NaN     6     6   NaN     3     2
     1     6     9     6     4     4     2
     3     9     9     6     4   NaN     2
     0     9     4     2     1     2     0

使用上述方法，与您的预期结果略有不同的是，我们将所有 nan 值设置为 0，并且这些值包含在中位数中。另外，如果元素的个数在中位数是偶数，那我干脆选择歧义右边的元素作为最终输出。

这可能不是您特别想要的。一种更有效的方法是 单独排序 所有列，同时保持 nan 值不变，然后确定每个有效列的最后一个元素，并为每个列确定最后一个元素元素，确定中间点在哪里，然后从排序列中选择那些元素。使用 sort 的一个好处是 nan 值被推向数组的末尾。

类似这样的方法可行：

r = 3;
nanMask = isnan(I); % Define nan mask
IP = padarray(I, floor([r/2 r/2]), 'both'); % Pad image
IPc = im2col(IP, [r r], 'sliding'); % Transform into columns
IPc = sort(IPc, 1, 'ascend'); % Sort the columns
[~,ind] = max(isnan(IPc), [], 1); % For each column, find the last valid number
ind(ind == 1) = r*r; % Handles the case when there are all valid numbers per column
ind = ceil(ind / 2); % Find the halfway point
out = reshape(IPc(sub2ind(size(IPc), ind, 1:size(IPc,2))), size(I,1), size(I,2)); % Find median of each column and reshape back
out(nanMask) = nan; % Set nan elements back

我们现在得到：

>> out

out =

     0     2     3     3     5     4     0
     2     4     6     6     6   NaN     3
     4     5     6     6     6   NaN     3
     3   NaN     6     6   NaN     3     3
     3     9     9     9     4     4     3
     3     9     9     6     6   NaN     2
     0     9     4     2     2     2     0

小注

最新版本的 MATLAB 有一个可选的第三个输入，称为 nanflag，您可以在其中明确确定遇到 nan 时要做什么。如果您将标志设置为 omitnan，这将在其计算中忽略所有 nan 元素，其中默认值为 includenan，您无需指定第三个参数。如果您在中值滤波器调用中指定 omitnan 并在第一步中跳过将 nan 值设置为 0 部分，您将从 [=33 的输出中得到您想要的=]:

r = 3;
nanMask = isnan(I); % Define nan mask
IP = padarray(I, floor([r/2 r/2]), 'both'); % Pad image
IPc = im2col(IP, [r r], 'sliding'); % Transform into columns
out = reshape(median(IPc, 1, 'omitnan'), size(I,1), size(I,2)); % Find median of each column and reshape back
out(nanMask) = nan; % Set nan elements back

我们得到：

>> out

out =

         0    2.0000    3.0000    3.0000    3.0000    2.5000         0
    2.0000    4.0000    6.0000    6.0000    6.0000       NaN    3.0000
    3.0000    4.5000    5.5000    6.0000    5.0000       NaN    3.0000
    2.0000       NaN    6.0000    6.0000       NaN    3.0000    2.5000
    2.0000    7.5000    9.0000    7.5000    4.0000    4.0000    2.5000
    3.0000    9.0000    9.0000    6.0000    5.0000       NaN    2.0000
         0    9.0000    4.0000    2.0000    1.5000    2.0000         0

更高效的`im2col`解决方案

用户Divakar has implemented a more faster version of im2col which he has benchmarked and is shown to be a lot faster than the im2col solution provided by MATLAB's image processing toolbox. If you're going to call this code many many times, consider using his implementation: Efficient Implementation of `im2col` and `col2im`

计时测试

为了确定提议的方法是否更快，我将使用 timeit 执行计时测试。首先，我将创建一个设置公共变量的函数，创建两个函数，其中第一个是使用 nlfilter 的原始方法，第二个方法是使用建议的方法。我将使用使用 'omitnan' 的方法，因为它会产生您想要的结果。

这是我写的函数。我已经生成了一个 300 x 300 的输入，就像你如何设置它一样，它包含 0 到 1 之间的所有随机数。我已经做到了，所以这个输入中大约 20% 的数字有 nan.我还设置了您使用 nlfilter 的匿名函数来过滤没有 nans 的中位数以及邻域大小，即 3 x 3。然后我在这段代码中定义了两个函数 -代码使用 nlfilter 进行过滤的原始方法以及我在上面使用 omitnan 选项提出的方法：

function time_nan

% Initial setup
rng(112234);
I = rand(300,300);
I(I < 0.2) = nan; % Modify approximately 20% of the values in the input with nan
r = 3; % Median filter of size 3
nanMask = isnan(I); % Determine which locations are nan
f = @(A)median(A(~isnan(A))); % Handle to function used by nlfilter

    function original_method
        filteredRes = nlfilter(I, [r r], f);
        filteredRes(nanMask) = nan;
    end

    function new_method
        IP = padarray(I, floor([r/2 r/2]), 'both'); % Pad image
        IPc = im2col(IP, [r r], 'sliding'); % Transform into columns
        out = reshape(median(IPc, 1, 'omitnan'), size(I,1), size(I,2)); % Find median of each column and reshape back
        out(nanMask) = nan; % Set nan elements back
    end

t1 = timeit(@original_method);
t2 = timeit(@new_method);

fprintf('Average time for original method: %f seconds\n', t1);
fprintf('Average time for new method: %f seconds\n', t2);

end

我当前的机器是 HP ZBook G5，配备 16 GB 内存和 Intel Core i7 @ 2.80 GHz。当你运行这段代码时，我得到以下结果：

Average time for original method: 1.033838 seconds
Average time for new method: 0.038697 seconds

如您所见，新方法运行大约比 nlfilter 快 (1.033838 / 0.038697) = 26.7162 倍。还不错！

二维中值滤波器，忽略 nan 值

2D median filter, ignore nan values

matlab

image

image-processing

vectorization

median

小注

更高效的`im2col`解决方案

计时测试

二维中值滤波器，忽略 nan 值

2D median filter, ignore nan values

matlab

image

image-processing

vectorization

median

小注

更高效的im2col解决方案

计时测试

更高效的`im2col`解决方案