计算每对向量的重复项的有效方法是什么?
What's the efficient way to count duplicates of each pair of a vector?
有什么有效的方法可以计算向量中每对的重复项吗?
例如,如果我有这样一个向量:
vector<pair<int, int> > duplicates={{1,2},{3,2},{2,1},{5,6},{5,6},{1,2},{2,1},{5,6}};
输出应该是:
{1,2}:2
{3,2}:1
{2,1}:2
{5,6}:3
明确地说,我只是想知道如何更有效地解决这个问题。我试图比较每一对这个向量,这似乎不是一个聪明的方法。
一种简单的方法是使用地图或无序地图来计算它们:
#include <iostream>
#include <vector>
#include <map>
int main( int argn, char **argc)
{
std::vector<std::pair<int, int> > duplicates={{1,2},{3,2},{2,1},{5,6},{5,6},{1,2},{2,1},{5,6}};
std::map<std::pair<int, int>, int> checker;
for (const auto &elem: duplicates)
{
++checker[elem];
}
for (const auto &elem: checker) std::cout << "{" << elem.first.first <<
"," << elem.first.second <<
"}: " << elem.second << std::endl;
return 0;
}
Note that map insertion/recovery is O(log(n)), and the loop around make it aprox. O(n*log(n))
编辑:
根据 OP 的附加说明,这是使用 unordered_map 的更好(更快)实现:
#include <iostream>
#include <vector>
#include <unordered_map>
namespace std
{
template <>
struct hash<std::pair<int,int>>
{
size_t operator()(pair<int, int> const &p) const
{
// Fine for 64bit size_t and 32bit int. Otherwise, some collision may happens.
size_t result = (static_cast<size_t>(p.first) <<(sizeof(std::size_t)<<2))
+ static_cast<size_t>(p.second);
return result;
}
};
}
int main( int argn, char **argc)
{
std::vector<std::pair<int, int> > duplicates={{1,2},{3,2},{2,1},{5,6},{5,6},{1,2},{2,1},{5,6}};
std::unordered_map<std::pair<int, int>, int> checker;
for (const auto &elem: duplicates)
{
++checker[elem]; // value initialized with 0
}
for (const auto &elem: checker) std::cout << "{" << elem.first.first <<
"," << elem.first.second <<
"}: " << elem.second << std::endl;
return 0;
}
Insertion in unordered_map, using a hash make it usually constant (worse case when there are collision is linear). Final complexity in average is O(N)
我有一个简单的解决方案:
- 排序向量对
- 如果匹配连续对然后增加计数器就只是一个循环
一般搜索复杂度:n*n
本次搜索复杂度:nlog(n)
有什么有效的方法可以计算向量中每对的重复项吗? 例如,如果我有这样一个向量:
vector<pair<int, int> > duplicates={{1,2},{3,2},{2,1},{5,6},{5,6},{1,2},{2,1},{5,6}};
输出应该是:
{1,2}:2
{3,2}:1
{2,1}:2
{5,6}:3
明确地说,我只是想知道如何更有效地解决这个问题。我试图比较每一对这个向量,这似乎不是一个聪明的方法。
一种简单的方法是使用地图或无序地图来计算它们:
#include <iostream>
#include <vector>
#include <map>
int main( int argn, char **argc)
{
std::vector<std::pair<int, int> > duplicates={{1,2},{3,2},{2,1},{5,6},{5,6},{1,2},{2,1},{5,6}};
std::map<std::pair<int, int>, int> checker;
for (const auto &elem: duplicates)
{
++checker[elem];
}
for (const auto &elem: checker) std::cout << "{" << elem.first.first <<
"," << elem.first.second <<
"}: " << elem.second << std::endl;
return 0;
}
Note that map insertion/recovery is O(log(n)), and the loop around make it aprox. O(n*log(n))
编辑:
根据 OP 的附加说明,这是使用 unordered_map 的更好(更快)实现:
#include <iostream>
#include <vector>
#include <unordered_map>
namespace std
{
template <>
struct hash<std::pair<int,int>>
{
size_t operator()(pair<int, int> const &p) const
{
// Fine for 64bit size_t and 32bit int. Otherwise, some collision may happens.
size_t result = (static_cast<size_t>(p.first) <<(sizeof(std::size_t)<<2))
+ static_cast<size_t>(p.second);
return result;
}
};
}
int main( int argn, char **argc)
{
std::vector<std::pair<int, int> > duplicates={{1,2},{3,2},{2,1},{5,6},{5,6},{1,2},{2,1},{5,6}};
std::unordered_map<std::pair<int, int>, int> checker;
for (const auto &elem: duplicates)
{
++checker[elem]; // value initialized with 0
}
for (const auto &elem: checker) std::cout << "{" << elem.first.first <<
"," << elem.first.second <<
"}: " << elem.second << std::endl;
return 0;
}
Insertion in unordered_map, using a hash make it usually constant (worse case when there are collision is linear). Final complexity in average is O(N)
我有一个简单的解决方案:
- 排序向量对
- 如果匹配连续对然后增加计数器就只是一个循环
一般搜索复杂度:n*n
本次搜索复杂度:nlog(n)