在数组中查找和乘以重复值的最快方法
Fastest way of finding and multiplying repeated values in array
在数组中查找重复值并将它们相乘的最快方法是什么?
示例:
a = [ 2 2 3 5 11 11 17 ]
结果:
a = [ 4 3 5 121 17 ]
我可以想到迭代方法(通过查找历史记录、遍历 bins,...),但是有 vectorized/fast 方法吗?
前瞻性方法和解决代码
似乎发布的问题很适合 accumarray
-
%// Starting indices of each "group"
start_ind = find(diff([0 ; a(:)]))
%// Setup IDs for each group
id = zeros(1,numel(a)) %// Or id(numel(a))=0 for faster pre-allocation
id(start_ind) = 1
%// Use accumarray to get the products of elements within the same group
out = accumarray(cumsum(id(:)),a(:),[],@prod)
对于非单调递增的输入,需要多加两行代码-
[~,sorted_idx] = ismember(sort(start_ind),start_ind)
out = out(sorted_idx)
样本运行-
>> a
a =
2 2 3 5 11 11 17 4 4 1 1 1 7 7
>> out.'
ans =
4 3 5 121 17 16 1 49
Tweaky-Squeaky
现在,可以使用 logical indexing
删除 find
,也可以使用
更快的预分配方案,为所提出的方法提供 超级提升 并为我们提供 调整后的代码 -
id(numel(a))=0;
id([true ; diff(a(:))~=0])=1;
out = accumarray(cumsum(id(:)),a(:),[],@prod);
基准测试
这是基准测试代码,它比较了迄今为止针对所述问题发布的所有提议方法 运行 次 -
%// Setup huge random input array
maxn = 10000;
N = 100000000;
a = sort(randi(maxn,1,N));
%// Warm up tic/toc.
for k = 1:100000
tic(); elapsed = toc();
end
disp('------------------------- With UNIQUE')
tic
ua = unique(a);
out = ua.^histc(a,ua);
toc, clear ua out
disp('------------------------- With ACCUMARRAY')
tic
id(numel(a))=0;
id([true ; diff(a(:))~=0])=1;
out = accumarray(cumsum(id(:)),a(:),[],@prod);
toc, clear out id
disp('------------------------- With FOR-LOOP')
tic
b = a(1);
for k = 2:numel(a)
if a(k)==a(k-1)
b(end) = b(end)*a(k);
else
b(end+1) = a(k);
end
end
toc
运行时间
------------------------- With UNIQUE
Elapsed time is 3.050523 seconds.
------------------------- With ACCUMARRAY
Elapsed time is 1.710499 seconds.
------------------------- With FOR-LOOP
Elapsed time is 1.811323 seconds.
结论: 看起来 运行 次,支持 accumarray
的想法优于其他两种方法!
ua = unique(a)
out = ua.^histc(a,ua)
out =
4 3 5 121 17
考虑到向量 a
是 非单调递增 的情况,它变得有点复杂:
%// non monotonically increasing vector
a = [ 2 2 3 5 11 11 17 4 4 1 1 1 7 7]
[ua, ia] = unique(a) %// get unique values and sort as required for histc
[~, idx] = ismember(sort(ia),ia) %// get original order
hc = histc(a,ua) %// count occurences
prods = ua.^hc %// calculate products
out = prods(idx) %// reorder to original order
或:
ua = unique(a,'stable') %// get unique values in original order
uas = unique(a) %// get unique values sorted as required for histc
[~,idx] = ismember(ua,uas) %// get indices of original order
hc = histc(a,uas) %// count occurences
out = ua.^hc(idx) %// calculate products and reorder
out =
4 3 5 121 17 16 1 49
似乎仍然是一个很好的解决方案,因为 accumarray
也是 。
您可能会惊讶于一个简单的 for
循环在速度方面的比较:
b = a(1);
for k = 2:numel(a)
if a(k)==a(k-1)
b(end) = b(end)*a(k);
else
b(end+1) = a(k);
end
end
即使不进行任何预分配,它的性能也与 accumarray
解决方案相当。
在数组中查找重复值并将它们相乘的最快方法是什么?
示例:
a = [ 2 2 3 5 11 11 17 ]
结果:
a = [ 4 3 5 121 17 ]
我可以想到迭代方法(通过查找历史记录、遍历 bins,...),但是有 vectorized/fast 方法吗?
前瞻性方法和解决代码
似乎发布的问题很适合 accumarray
-
%// Starting indices of each "group"
start_ind = find(diff([0 ; a(:)]))
%// Setup IDs for each group
id = zeros(1,numel(a)) %// Or id(numel(a))=0 for faster pre-allocation
id(start_ind) = 1
%// Use accumarray to get the products of elements within the same group
out = accumarray(cumsum(id(:)),a(:),[],@prod)
对于非单调递增的输入,需要多加两行代码-
[~,sorted_idx] = ismember(sort(start_ind),start_ind)
out = out(sorted_idx)
样本运行-
>> a
a =
2 2 3 5 11 11 17 4 4 1 1 1 7 7
>> out.'
ans =
4 3 5 121 17 16 1 49
Tweaky-Squeaky
现在,可以使用 logical indexing
删除 find
,也可以使用
更快的预分配方案,为所提出的方法提供 超级提升 并为我们提供 调整后的代码 -
id(numel(a))=0;
id([true ; diff(a(:))~=0])=1;
out = accumarray(cumsum(id(:)),a(:),[],@prod);
基准测试
这是基准测试代码,它比较了迄今为止针对所述问题发布的所有提议方法 运行 次 -
%// Setup huge random input array
maxn = 10000;
N = 100000000;
a = sort(randi(maxn,1,N));
%// Warm up tic/toc.
for k = 1:100000
tic(); elapsed = toc();
end
disp('------------------------- With UNIQUE')
tic
ua = unique(a);
out = ua.^histc(a,ua);
toc, clear ua out
disp('------------------------- With ACCUMARRAY')
tic
id(numel(a))=0;
id([true ; diff(a(:))~=0])=1;
out = accumarray(cumsum(id(:)),a(:),[],@prod);
toc, clear out id
disp('------------------------- With FOR-LOOP')
tic
b = a(1);
for k = 2:numel(a)
if a(k)==a(k-1)
b(end) = b(end)*a(k);
else
b(end+1) = a(k);
end
end
toc
运行时间
------------------------- With UNIQUE
Elapsed time is 3.050523 seconds.
------------------------- With ACCUMARRAY
Elapsed time is 1.710499 seconds.
------------------------- With FOR-LOOP
Elapsed time is 1.811323 seconds.
结论: 看起来 运行 次,支持 accumarray
的想法优于其他两种方法!
ua = unique(a)
out = ua.^histc(a,ua)
out =
4 3 5 121 17
考虑到向量 a
是 非单调递增 的情况,它变得有点复杂:
%// non monotonically increasing vector
a = [ 2 2 3 5 11 11 17 4 4 1 1 1 7 7]
[ua, ia] = unique(a) %// get unique values and sort as required for histc
[~, idx] = ismember(sort(ia),ia) %// get original order
hc = histc(a,ua) %// count occurences
prods = ua.^hc %// calculate products
out = prods(idx) %// reorder to original order
或:
ua = unique(a,'stable') %// get unique values in original order
uas = unique(a) %// get unique values sorted as required for histc
[~,idx] = ismember(ua,uas) %// get indices of original order
hc = histc(a,uas) %// count occurences
out = ua.^hc(idx) %// calculate products and reorder
out =
4 3 5 121 17 16 1 49
似乎仍然是一个很好的解决方案,因为 accumarray
也是
您可能会惊讶于一个简单的 for
循环在速度方面的比较:
b = a(1);
for k = 2:numel(a)
if a(k)==a(k-1)
b(end) = b(end)*a(k);
else
b(end+1) = a(k);
end
end
即使不进行任何预分配,它的性能也与 accumarray
解决方案相当。