在矩阵的列中运行计数
Counting runs in columns of a matrix
我有一个 1s
和 -1s
的矩阵,其中随机散布着 0s
:
%// create matrix of 1s and -1s
hypwayt = randn(10,5);
hypwayt(hypwayt > 0) = 1;
hypwayt(hypwayt < 0) = -1;
%// create numz random indices at which to insert 0s (pairs of indices may
%// repeat, so final number of inserted zeros may be < numz)
numz = 15;
a = 1;
b = 10;
r = round((b-a).*rand(numz,1) + a);
s = round((5-1).*rand(numz,1) + a);
for nx = 1:numz
hypwayt(r(nx),s(nx)) = 0
end
输入:
hypwayt =
-1 1 1 1 1
1 -1 1 1 1
1 -1 1 0 0
-1 1 0 -1 1
1 -1 0 0 0
-1 1 -1 -1 -1
1 1 0 1 -1
0 1 -1 1 -1
-1 0 1 1 0
1 -1 0 -1 -1
我想计算 nonzero
元素在一列中重复了多少次,以产生如下内容:
The basic idea is (provided by @rayryeng) For each column independently, every time you hit a unique number, you start incrementing a cumulative running counter and it increments every time you hit the same number as the previous one. As soon as you hit a new number, it gets reset to 1, except for the case when you hit a 0, and so that's 0
预期输出:
hypwayt_runs =
1 1 1 1 1
1 1 2 2 2
2 2 3 0 0
1 1 0 1 1
1 1 0 0 0
1 1 1 1 1
1 2 0 1 2
0 3 1 2 3
1 0 1 3 0
1 1 0 1 1
完成此操作的最简洁方法是什么?
我想应该有更好的方法,但这应该可行
使用cumsum
,diff
,accumarray
& bsxfun
%// doing the 'diff' along default dim to get the adjacent equality
out = [ones(1,size(A,2));diff(A)];
%// Putting all other elements other than zero to 1
out(find(out)) = 1;
%// getting all the indexes of 0 elements
ind = find(out == 0);
%// doing 'diff' on indices to find adjacent indices
out1 = [0;diff(ind)];
%// Putting all those elements which are 1 to zero and rest to 1
out1 = 0.*(out1 == 1) + out1 ~= 1;
%// counting each unique group's number of elements
out1 = accumarray(cumsum(out1),1);
%// Creating a mask for next operation
mask = bsxfun(@le, (1:max(out1)).',out1.');
%// Doing colon operation from 2 to maxsize
out1 = bsxfun(@times,mask,(2:size(mask,1)+1).'); %'
%// Assign the values from the out1 to corresponding indices of out
out(ind) = out1(mask);
%// finally replace all elements of A which were zero to zero
out(A==0) = 0
结果:
输入:
>> A
A =
-1 1 1 1 1
1 -1 1 1 1
1 -1 1 0 0
-1 1 0 -1 1
1 -1 0 0 0
-1 1 -1 -1 -1
1 1 0 1 -1
0 1 -1 1 -1
-1 0 1 1 0
1 -1 0 -1 -1
输出:
>> out
out =
1 1 1 1 1
1 1 2 2 2
2 2 3 0 0
1 1 0 1 1
1 1 0 0 0
1 1 1 1 1
1 2 0 1 2
0 3 1 2 3
1 0 1 3 0
1 1 0 1 1
作为 Dev-IL 的动机,这里有一个使用循环的解决方案。尽管代码是可读的,但我认为它很慢,因为您必须单独遍历每个元素。
hypwayt = [-1 1 1 1 1;
1 -1 1 1 1;
1 -1 1 0 0;
-1 1 0 -1 1;
1 -1 0 0 0;
-1 1 -1 -1 -1;
1 1 0 1 -1;
0 1 -1 1 -1;
-1 0 1 1 0;
1 -1 0 -1 -1];
%// Initialize output array
out = ones(size(hypwayt));
%// For each column
for idx = 1 : size(hypwayt, 2)
%// Previous value initialized as the first row
prev = hypwayt(1,idx);
%// For each row after this point...
for idx2 = 2 : size(hypwayt,1)
% // If the current value isn't equal to the previous value...
if hypwayt(idx2,idx) ~= prev
%// Set the new previous value
prev = hypwayt(idx2,idx);
%// Case for 0
if hypwayt(idx2,idx) == 0
out(idx2,idx) = 0;
end
%// Else, reset the value to 1
%// Already done by initialization
%// If equal, increment
%// Must also check for 0
else
if hypwayt(idx2,idx) ~= 0
out(idx2,idx) = out(idx2-1,idx) + 1;
else
out(idx2,idx) = 0;
end
end
end
end
输出
>> out
out =
1 1 1 1 1
1 1 2 2 2
2 2 3 0 0
1 1 0 1 1
1 1 0 0 0
1 1 1 1 1
1 2 0 1 2
0 3 1 2 3
1 0 1 3 0
1 1 0 1 1
基于 ,以下是我对基于循环的解决方案的看法。
输入:
hypwayt = [
-1 1 1 1 1
1 -1 1 1 1
1 -1 1 0 0
-1 1 0 -1 1
1 -1 0 0 0
-1 1 -1 -1 -1
1 1 0 1 -1
0 1 -1 1 -1
-1 0 1 1 0
1 -1 0 -1 -1 ];
expected_out = [
1 1 1 1 1
1 1 2 2 2
2 2 3 0 0
1 1 0 1 1
1 1 0 0 0
1 1 1 1 1
1 2 0 1 2
0 3 1 2 3
1 0 1 3 0
1 1 0 1 1 ];
实际代码:
CNT_INIT = 2; %// a constant representing an initialized counter
out = hypwayt; %// "preallocation"
out(2:end,:) = diff(out); %// ...we'll deal with the top row later
hyp_nnz = hypwayt~=0; %// nonzero mask for later brevity
cnt = CNT_INIT; %// first initialization of the counter
for ind1 = 2:numel(out)
switch abs(out(ind1))
case 2 %// switch from -1 to 1 and vice versa:
out(ind1) = 1;
cnt = CNT_INIT;
case 0 %// means we have the same number again:
out(ind1) = cnt*hyp_nnz(ind1); %//put cnt unless we're zero
cnt = cnt+1;
case 1 %// means we transitioned to/from zero:
out(ind1) = hyp_nnz(ind1); %// was it a nonzero element?
cnt = CNT_INIT;
end
end
%// Finally, take care of the top row:
out(1,:) = hyp_nnz(1,:);
正确性测试:
assert(isequal(out,expected_out))
我猜它可能会通过使用一些 "complex" MATLAB 函数进一步简化,但恕我直言,它看起来确实足够优雅:)
注意:out
的第一行被计算了两次(一次在循环中,一次在结束时),因此两次计算值会导致效率低下。但是,它允许将整个逻辑放入在 numel()
上运行的单个循环中,在我看来,这证明了这一点额外计算的合理性。
这是一个很好的问题,因为 没有提出矢量化解决方案,这里是我的几行 - 好吧,这不公平,我花了半天时间才解决这个问题一。基本思想是使用cumsum
作为最终函数。
p = size(hypwayt,2); % keep nb of columns in mind
% H1 is the mask of consecutive identical values, but kept as an array of double (it will be incremented later)
H1 = [zeros(1,p);diff(hypwayt)==0];
% H2 is the mask of elements where a consecutive sequence of identical values ends. Note the first line of trues.
H2 = [true(1,p);diff(~H1)>0];
% 1st trick: compute the vectorized cumsum of H1
H3 = cumsum(H1(:));
% 2nd trick: take the diff of H3(H2).
% it results in a vector of the lengths of consecutive sequences of identical values, interleaved with some zeros.
% substract it to H1 at the same locations
H1(H2) = H1(H2)-[0;diff(H3(H2))];
% H1 is ready to be cumsummed! Add one to the array, all lengths are decreased by one.
Output = cumsum(H1)+1;
% last force input zeros to be zero
Output(hypwayt==0) = 0;
预期输出:
Output =
1 1 1 1 1
1 1 2 2 2
2 2 3 0 0
1 1 0 1 1
1 1 0 0 0
1 1 1 1 1
1 2 0 1 2
0 3 1 2 3
1 0 1 3 0
1 1 0 1 1
让我补充一些解释。大技巧当然是第二个,我花了一段时间才弄清楚如何快速计算连续相同值的长度。第一个只是在没有任何 for 循环的情况下计算整个事情的小技巧。如果你直接 cumsum H1
,你会得到带有一些偏移量的结果。这些偏移量以符合 cumsum 的方式删除,通过获取一些键值的局部差异并在这些序列结束后删除它们。这些特殊值已超出编号,我还采用了第一行(H2
的第一行):每个第一列元素都被视为与前一列的最后一个元素不同。
我希望现在更清楚一点(并且没有一些特殊情况的缺陷......)。
我有一个 1s
和 -1s
的矩阵,其中随机散布着 0s
:
%// create matrix of 1s and -1s
hypwayt = randn(10,5);
hypwayt(hypwayt > 0) = 1;
hypwayt(hypwayt < 0) = -1;
%// create numz random indices at which to insert 0s (pairs of indices may
%// repeat, so final number of inserted zeros may be < numz)
numz = 15;
a = 1;
b = 10;
r = round((b-a).*rand(numz,1) + a);
s = round((5-1).*rand(numz,1) + a);
for nx = 1:numz
hypwayt(r(nx),s(nx)) = 0
end
输入:
hypwayt =
-1 1 1 1 1
1 -1 1 1 1
1 -1 1 0 0
-1 1 0 -1 1
1 -1 0 0 0
-1 1 -1 -1 -1
1 1 0 1 -1
0 1 -1 1 -1
-1 0 1 1 0
1 -1 0 -1 -1
我想计算 nonzero
元素在一列中重复了多少次,以产生如下内容:
The basic idea is (provided by @rayryeng) For each column independently, every time you hit a unique number, you start incrementing a cumulative running counter and it increments every time you hit the same number as the previous one. As soon as you hit a new number, it gets reset to 1, except for the case when you hit a 0, and so that's 0
预期输出:
hypwayt_runs =
1 1 1 1 1
1 1 2 2 2
2 2 3 0 0
1 1 0 1 1
1 1 0 0 0
1 1 1 1 1
1 2 0 1 2
0 3 1 2 3
1 0 1 3 0
1 1 0 1 1
完成此操作的最简洁方法是什么?
我想应该有更好的方法,但这应该可行
使用cumsum
,diff
,accumarray
& bsxfun
%// doing the 'diff' along default dim to get the adjacent equality
out = [ones(1,size(A,2));diff(A)];
%// Putting all other elements other than zero to 1
out(find(out)) = 1;
%// getting all the indexes of 0 elements
ind = find(out == 0);
%// doing 'diff' on indices to find adjacent indices
out1 = [0;diff(ind)];
%// Putting all those elements which are 1 to zero and rest to 1
out1 = 0.*(out1 == 1) + out1 ~= 1;
%// counting each unique group's number of elements
out1 = accumarray(cumsum(out1),1);
%// Creating a mask for next operation
mask = bsxfun(@le, (1:max(out1)).',out1.');
%// Doing colon operation from 2 to maxsize
out1 = bsxfun(@times,mask,(2:size(mask,1)+1).'); %'
%// Assign the values from the out1 to corresponding indices of out
out(ind) = out1(mask);
%// finally replace all elements of A which were zero to zero
out(A==0) = 0
结果:
输入:
>> A
A =
-1 1 1 1 1
1 -1 1 1 1
1 -1 1 0 0
-1 1 0 -1 1
1 -1 0 0 0
-1 1 -1 -1 -1
1 1 0 1 -1
0 1 -1 1 -1
-1 0 1 1 0
1 -1 0 -1 -1
输出:
>> out
out =
1 1 1 1 1
1 1 2 2 2
2 2 3 0 0
1 1 0 1 1
1 1 0 0 0
1 1 1 1 1
1 2 0 1 2
0 3 1 2 3
1 0 1 3 0
1 1 0 1 1
作为 Dev-IL 的动机,这里有一个使用循环的解决方案。尽管代码是可读的,但我认为它很慢,因为您必须单独遍历每个元素。
hypwayt = [-1 1 1 1 1;
1 -1 1 1 1;
1 -1 1 0 0;
-1 1 0 -1 1;
1 -1 0 0 0;
-1 1 -1 -1 -1;
1 1 0 1 -1;
0 1 -1 1 -1;
-1 0 1 1 0;
1 -1 0 -1 -1];
%// Initialize output array
out = ones(size(hypwayt));
%// For each column
for idx = 1 : size(hypwayt, 2)
%// Previous value initialized as the first row
prev = hypwayt(1,idx);
%// For each row after this point...
for idx2 = 2 : size(hypwayt,1)
% // If the current value isn't equal to the previous value...
if hypwayt(idx2,idx) ~= prev
%// Set the new previous value
prev = hypwayt(idx2,idx);
%// Case for 0
if hypwayt(idx2,idx) == 0
out(idx2,idx) = 0;
end
%// Else, reset the value to 1
%// Already done by initialization
%// If equal, increment
%// Must also check for 0
else
if hypwayt(idx2,idx) ~= 0
out(idx2,idx) = out(idx2-1,idx) + 1;
else
out(idx2,idx) = 0;
end
end
end
end
输出
>> out
out =
1 1 1 1 1
1 1 2 2 2
2 2 3 0 0
1 1 0 1 1
1 1 0 0 0
1 1 1 1 1
1 2 0 1 2
0 3 1 2 3
1 0 1 3 0
1 1 0 1 1
基于
输入:
hypwayt = [
-1 1 1 1 1
1 -1 1 1 1
1 -1 1 0 0
-1 1 0 -1 1
1 -1 0 0 0
-1 1 -1 -1 -1
1 1 0 1 -1
0 1 -1 1 -1
-1 0 1 1 0
1 -1 0 -1 -1 ];
expected_out = [
1 1 1 1 1
1 1 2 2 2
2 2 3 0 0
1 1 0 1 1
1 1 0 0 0
1 1 1 1 1
1 2 0 1 2
0 3 1 2 3
1 0 1 3 0
1 1 0 1 1 ];
实际代码:
CNT_INIT = 2; %// a constant representing an initialized counter
out = hypwayt; %// "preallocation"
out(2:end,:) = diff(out); %// ...we'll deal with the top row later
hyp_nnz = hypwayt~=0; %// nonzero mask for later brevity
cnt = CNT_INIT; %// first initialization of the counter
for ind1 = 2:numel(out)
switch abs(out(ind1))
case 2 %// switch from -1 to 1 and vice versa:
out(ind1) = 1;
cnt = CNT_INIT;
case 0 %// means we have the same number again:
out(ind1) = cnt*hyp_nnz(ind1); %//put cnt unless we're zero
cnt = cnt+1;
case 1 %// means we transitioned to/from zero:
out(ind1) = hyp_nnz(ind1); %// was it a nonzero element?
cnt = CNT_INIT;
end
end
%// Finally, take care of the top row:
out(1,:) = hyp_nnz(1,:);
正确性测试:
assert(isequal(out,expected_out))
我猜它可能会通过使用一些 "complex" MATLAB 函数进一步简化,但恕我直言,它看起来确实足够优雅:)
注意:out
的第一行被计算了两次(一次在循环中,一次在结束时),因此两次计算值会导致效率低下。但是,它允许将整个逻辑放入在 numel()
上运行的单个循环中,在我看来,这证明了这一点额外计算的合理性。
这是一个很好的问题,因为 cumsum
作为最终函数。
p = size(hypwayt,2); % keep nb of columns in mind
% H1 is the mask of consecutive identical values, but kept as an array of double (it will be incremented later)
H1 = [zeros(1,p);diff(hypwayt)==0];
% H2 is the mask of elements where a consecutive sequence of identical values ends. Note the first line of trues.
H2 = [true(1,p);diff(~H1)>0];
% 1st trick: compute the vectorized cumsum of H1
H3 = cumsum(H1(:));
% 2nd trick: take the diff of H3(H2).
% it results in a vector of the lengths of consecutive sequences of identical values, interleaved with some zeros.
% substract it to H1 at the same locations
H1(H2) = H1(H2)-[0;diff(H3(H2))];
% H1 is ready to be cumsummed! Add one to the array, all lengths are decreased by one.
Output = cumsum(H1)+1;
% last force input zeros to be zero
Output(hypwayt==0) = 0;
预期输出:
Output =
1 1 1 1 1
1 1 2 2 2
2 2 3 0 0
1 1 0 1 1
1 1 0 0 0
1 1 1 1 1
1 2 0 1 2
0 3 1 2 3
1 0 1 3 0
1 1 0 1 1
让我补充一些解释。大技巧当然是第二个,我花了一段时间才弄清楚如何快速计算连续相同值的长度。第一个只是在没有任何 for 循环的情况下计算整个事情的小技巧。如果你直接 cumsum H1
,你会得到带有一些偏移量的结果。这些偏移量以符合 cumsum 的方式删除,通过获取一些键值的局部差异并在这些序列结束后删除它们。这些特殊值已超出编号,我还采用了第一行(H2
的第一行):每个第一列元素都被视为与前一列的最后一个元素不同。
我希望现在更清楚一点(并且没有一些特殊情况的缺陷......)。