在 Matlab 中查找持续时间值的时间线

Question

我有以下时间序列：

b = [2 5 110 113 55 115 80 90 120 35 123];

b中的每个数字是一个时刻的一个数据点。我从 b 计算了持续时间值。 Duration用b内大于等于100的所有数字表示，并连续排列（其他数字全部舍弃）。允许小于 100 的最大间隙。持续时间的代码如下所示：

 N = 2;     % maximum allowed gap     
 duration = cellfun(@numel, regexp(char((b>=100)+'0'), [repmat('0',1,N) '+'],    'split'));

为 b 提供以下持续时间值：

duration = [4 3];

我想为 duration 中的每个值找到 b 中的位置（时间线）。接下来，我想用零替换位于 duration 之外的其他位置。结果将如下所示：

result = [0 0 3 4 5 6 0 0 9 10 11];

如果有人能提供帮助，那就太好了。

Answer 1

原问题的答案：最多有一个值低于 100 的模式

这是一种使用正则表达式检测所需模式的方法。我假设一个值 <100 只允许在值 >=100 之间（而不是之后）。所以模式是：一个或多个值 >=100 之间的可能值 <100 .

b = [2 5 110 113 55 115 80 90 120 35 123]; %// data
B = char((b>=100)+'0'); %// convert to string of '0' and '1'
[s, e] = regexp(B, '1+(.1+|)', 'start', 'end'); %// find pattern
y = 1:numel(B);
c = any(bsxfun(@ge, y, s(:)) & bsxfun(@le, y, e(:))); %// filter by locations of pattern
y = y.*c; %// result

这给了

y =
     0     0     3     4     5     6     0     0     9    10    11

已编辑问题的答案：最多 n 个连续值低于 100

的模式

正则表达式需要修改，必须动态构建为n的函数：

b = [2 5 110 113 55 115 80 90 120 35 123]; %// data
n = 2;
B = char((b>=100)+'0'); %// convert to string of '0' and '1'
r = sprintf('1+(.{1,%i}1+)*', n); %// build the regular expression from n
[s, e] = regexp(B, r, 'start', 'end'); %// find pattern
y = 1:numel(B);
c = any(bsxfun(@ge, y, s(:)) & bsxfun(@le, y, e(:))); %// filter by locations of pattern
y = y.*c; %// result

Answer 2

这是另一个解决方案，不使用 regexp。它自然地推广到任意间隙大小和阈值。不确定是否有更好的方法来填补空白。评论中的解释：

% maximum step size and threshold
N = 2;
threshold = 100;
% data
b = [2 5 110 113 55 115 80 90 120 35 123];

% find valid data
B = b >= threshold;
B_ind = find(B);
% find lengths of gaps
step_size = diff(B_ind);
% find acceptable steps (and ignore step size 1)
permissible_steps = 1 < step_size & step_size <= N;
% find beginning and end of runs
good_begin = B_ind([permissible_steps, false]);
good_end = good_begin + step_size(permissible_steps);
% fill gaps in B
for ii = 1:numel(good_begin)
    B(good_begin(ii):good_end(ii)) = true;
end
% find durations of runs in B. This finds points where we switch from 0 to
% 1 and vice versa. Due to padding the first match is always a start of a
% run, the last one always an end. There will be an even number of matches,
% so we can reshape and diff and thus fidn the durations
durations = diff(reshape(find(diff([false, B, false])), 2, []));

% get positions of 'good' data
outpos = zeros(size(b));
outpos(B) = find(B);

在 Matlab 中查找持续时间值的时间线

Find timeline for duration values in Matlab

arrays

matlab

duration

原问题的答案：最多有一个值低于 100 的模式

已编辑问题的答案：最多 n 个连续值低于 100