用 std::priority_queue 推入最小堆是瓶颈
Pushing in min heap with std::priority_queue is the bottleneck
作为最小堆,有比 std::priority_queue
更快的东西吗?
原题是. You can resolve the names generated by grpof with the demangler。在那里的用户的帮助下,我得出了一个结论,这段代码(我希望执行的次数比执行 pop 的次数多得多):
/**
* Min_heap is actually a std::priority_queue,
* with std::greater as a parameter.
*/
typedef std::priority_queue<std::tuple<float, int, int>,
std::vector<std::tuple<float, int, int> >,
std::greater<std::tuple<float, int, int> > > Min_heap;
...
void nn(..) { // the core function of my project
if(...) {
branch.push(std::make_tuple(new_dist, other_child_i, tree_i));
}
}
似乎是我项目的瓶颈(我认为),正如我在调用后看到的:
gprof -q_ZN3RKDI24Division_Euclidean_spaceIfEE2nnEjRKSt6vectorIfSaIfEERKfRS3_ISt4pairIfiESaISB_EERiiPjRSt14priority_queueISt5tupleIJfiiEES3_ISJ_SaISJ_EESt7greaterISJ_EES9_RKjS7_S7_i geraf gmon.out > analysis.txt
得到这个:
granularity: each sample hit covers 2 byte(s) for 0.01% of 125.47 seconds
index % time self children called name
1967195 _ZN3RKDI24Division_Euclidean_spaceIfEE2nnEjRKSt6vectorIfSaIfEERKfRS3_ISt4pairIfiESaISB_EERiiPjRSt14priority_queueISt5tupleIJfiiEES3_ISJ_SaISJ_EESt7greaterISJ_EES9_RKjS7_S7_i [2]
105.54 0.09 320000/320000 Auto_random_kd_forest<float>::Auto_random_kd_forest(unsigned int&, unsigned int&, std::string const&, unsigned int, std::string const&, int, float, std::vector<std::vector<std::pair<float, int>, std::allocator<std::pair<float, int> > >, std::allocator<std::vector<std::pair<float, int>, std::allocator<std::pair<float, int> > > > >&, Params*, int) (1)
[2] 84.2 105.54 0.09 320000+1967195 _ZN3RKDI24Division_Euclidean_spaceIfEE2nnEjRKSt6vectorIfSaIfEERKfRS3_ISt4pairIfiESaISB_EERiiPjRSt14priority_queueISt5tupleIJfiiEES3_ISJ_SaISJ_EESt7greaterISJ_EES9_RKjS7_S7_i [2]
0.08 0.00 1967195/2031195 void std::__push_heap<__gnu_cxx::__normal_iterator<std::tuple<float, int, int>*, std::vector<std::tuple<float, int, int>, std::allocator<std::tuple<float, int, int> > > >, int, std::tuple<float, int, int>, std::greater<std::tuple<float, int, int> > >(__gnu_cxx::__normal_iterator<std::tuple<float, int, int>*, std::vector<std::tuple<float, int, int>, std::allocator<std::tuple<float, int, int> > > >, int, int, std::tuple<float, int, int>, std::greater<std::tuple<float, int, int> >) [9]
0.01 0.00 12000/12000 _ZNSt6vectorISt5tupleIJfiiEESaIS1_EE19_M_emplace_back_auxIJS1_EEEvDpOT_ [11]
1967195 _ZN3RKDI24Division_Euclidean_spaceIfEE2nnEjRKSt6vectorIfSaIfEERKfRS3_ISt4pairIfiESaISB_EERiiPjRSt14priority_queueISt5tupleIJfiiEES3_ISJ_SaISJ_EESt7greaterISJ_EES9_RKjS7_S7_i [2]
-----------------------------------------------
0.00 0.00 64000/2031195 _ZSt13__adjust_heapIN9__gnu_cxx17__normal_iteratorIPSt5tupleIJfiiEESt6vectorIS3_SaIS3_EEEEiS3_St7greaterIS3_EEvT_T0_SC_T1_T2_ (10)
0.08 0.00 1967195/2031195 _ZN3RKDI24Division_Euclidean_spaceIfEE2nnEjRKSt6vectorIfSaIfEERKfRS3_ISt4pairIfiESaISB_EERiiPjRSt14priority_queueISt5tupleIJfiiEES3_ISJ_SaISJ_EESt7greaterISJ_EES9_RKjS7_S7_i [2]
[9] 0.1 0.09 0.00 2031195 void std::__push_heap<__gnu_cxx::__normal_iterator<std::tuple<float, int, int>*, std::vector<std::tuple<float, int, int>, std::allocator<std::tuple<float, int, int> > > >, int, std::tuple<float, int, int>, std::greater<std::tuple<float, int, int> > >(__gnu_cxx::__normal_iterator<std::tuple<float, int, int>*, std::vector<std::tuple<float, int, int>, std::allocator<std::tuple<float, int, int> > > >, int, int, std::tuple<float, int, int>, std::greater<std::tuple<float, int, int> >) [9]
-----------------------------------------------
0.01 0.00 12000/12000 _ZN3RKDI24Division_Euclidean_spaceIfEE2nnEjRKSt6vectorIfSaIfEERKfRS3_ISt4pairIfiESaISB_EERiiPjRSt14priority_queueISt5tupleIJfiiEES3_ISJ_SaISJ_EESt7greaterISJ_EES9_RKjS7_S7_i [2]
[11] 0.0 0.01 0.00 12000 _ZNSt6vectorISt5tupleIJfiiEESaIS1_EE19_M_emplace_back_auxIJS1_EEEvDpOT_ [11]
我推的东西比我推的多。在我链接的问题中,我显示了与我的堆相关的代码。我们可以假设堆永远不会变空。我不确定分布。
查看 wiki 上的堆比较:
http://en.wikipedia.org/wiki/Heap_(data_structure)
最快的是斐波那契堆。
AFAIK STL 优先级队列只是一个二进制堆。
您可以使用斐波那契堆的 boost 实现来提高速度。
这里还有一个来自 SO 的 question 显示了提升堆的用法。
值得一提的是,wikipedia中显示的数据是堆的理论比较。 STL 实现是二进制堆,因为它通常比小堆的斐波那契堆快得多。总结斐波那契堆信息的好问题是 here.
作为最小堆,有比 std::priority_queue
更快的东西吗?
原题是
/**
* Min_heap is actually a std::priority_queue,
* with std::greater as a parameter.
*/
typedef std::priority_queue<std::tuple<float, int, int>,
std::vector<std::tuple<float, int, int> >,
std::greater<std::tuple<float, int, int> > > Min_heap;
...
void nn(..) { // the core function of my project
if(...) {
branch.push(std::make_tuple(new_dist, other_child_i, tree_i));
}
}
似乎是我项目的瓶颈(我认为),正如我在调用后看到的:
gprof -q_ZN3RKDI24Division_Euclidean_spaceIfEE2nnEjRKSt6vectorIfSaIfEERKfRS3_ISt4pairIfiESaISB_EERiiPjRSt14priority_queueISt5tupleIJfiiEES3_ISJ_SaISJ_EESt7greaterISJ_EES9_RKjS7_S7_i geraf gmon.out > analysis.txt
得到这个:
granularity: each sample hit covers 2 byte(s) for 0.01% of 125.47 seconds
index % time self children called name
1967195 _ZN3RKDI24Division_Euclidean_spaceIfEE2nnEjRKSt6vectorIfSaIfEERKfRS3_ISt4pairIfiESaISB_EERiiPjRSt14priority_queueISt5tupleIJfiiEES3_ISJ_SaISJ_EESt7greaterISJ_EES9_RKjS7_S7_i [2]
105.54 0.09 320000/320000 Auto_random_kd_forest<float>::Auto_random_kd_forest(unsigned int&, unsigned int&, std::string const&, unsigned int, std::string const&, int, float, std::vector<std::vector<std::pair<float, int>, std::allocator<std::pair<float, int> > >, std::allocator<std::vector<std::pair<float, int>, std::allocator<std::pair<float, int> > > > >&, Params*, int) (1)
[2] 84.2 105.54 0.09 320000+1967195 _ZN3RKDI24Division_Euclidean_spaceIfEE2nnEjRKSt6vectorIfSaIfEERKfRS3_ISt4pairIfiESaISB_EERiiPjRSt14priority_queueISt5tupleIJfiiEES3_ISJ_SaISJ_EESt7greaterISJ_EES9_RKjS7_S7_i [2]
0.08 0.00 1967195/2031195 void std::__push_heap<__gnu_cxx::__normal_iterator<std::tuple<float, int, int>*, std::vector<std::tuple<float, int, int>, std::allocator<std::tuple<float, int, int> > > >, int, std::tuple<float, int, int>, std::greater<std::tuple<float, int, int> > >(__gnu_cxx::__normal_iterator<std::tuple<float, int, int>*, std::vector<std::tuple<float, int, int>, std::allocator<std::tuple<float, int, int> > > >, int, int, std::tuple<float, int, int>, std::greater<std::tuple<float, int, int> >) [9]
0.01 0.00 12000/12000 _ZNSt6vectorISt5tupleIJfiiEESaIS1_EE19_M_emplace_back_auxIJS1_EEEvDpOT_ [11]
1967195 _ZN3RKDI24Division_Euclidean_spaceIfEE2nnEjRKSt6vectorIfSaIfEERKfRS3_ISt4pairIfiESaISB_EERiiPjRSt14priority_queueISt5tupleIJfiiEES3_ISJ_SaISJ_EESt7greaterISJ_EES9_RKjS7_S7_i [2]
-----------------------------------------------
0.00 0.00 64000/2031195 _ZSt13__adjust_heapIN9__gnu_cxx17__normal_iteratorIPSt5tupleIJfiiEESt6vectorIS3_SaIS3_EEEEiS3_St7greaterIS3_EEvT_T0_SC_T1_T2_ (10)
0.08 0.00 1967195/2031195 _ZN3RKDI24Division_Euclidean_spaceIfEE2nnEjRKSt6vectorIfSaIfEERKfRS3_ISt4pairIfiESaISB_EERiiPjRSt14priority_queueISt5tupleIJfiiEES3_ISJ_SaISJ_EESt7greaterISJ_EES9_RKjS7_S7_i [2]
[9] 0.1 0.09 0.00 2031195 void std::__push_heap<__gnu_cxx::__normal_iterator<std::tuple<float, int, int>*, std::vector<std::tuple<float, int, int>, std::allocator<std::tuple<float, int, int> > > >, int, std::tuple<float, int, int>, std::greater<std::tuple<float, int, int> > >(__gnu_cxx::__normal_iterator<std::tuple<float, int, int>*, std::vector<std::tuple<float, int, int>, std::allocator<std::tuple<float, int, int> > > >, int, int, std::tuple<float, int, int>, std::greater<std::tuple<float, int, int> >) [9]
-----------------------------------------------
0.01 0.00 12000/12000 _ZN3RKDI24Division_Euclidean_spaceIfEE2nnEjRKSt6vectorIfSaIfEERKfRS3_ISt4pairIfiESaISB_EERiiPjRSt14priority_queueISt5tupleIJfiiEES3_ISJ_SaISJ_EESt7greaterISJ_EES9_RKjS7_S7_i [2]
[11] 0.0 0.01 0.00 12000 _ZNSt6vectorISt5tupleIJfiiEESaIS1_EE19_M_emplace_back_auxIJS1_EEEvDpOT_ [11]
我推的东西比我推的多。在我链接的问题中,我显示了与我的堆相关的代码。我们可以假设堆永远不会变空。我不确定分布。
查看 wiki 上的堆比较:
http://en.wikipedia.org/wiki/Heap_(data_structure)
最快的是斐波那契堆。 AFAIK STL 优先级队列只是一个二进制堆。 您可以使用斐波那契堆的 boost 实现来提高速度。 这里还有一个来自 SO 的 question 显示了提升堆的用法。
值得一提的是,wikipedia中显示的数据是堆的理论比较。 STL 实现是二进制堆,因为它通常比小堆的斐波那契堆快得多。总结斐波那契堆信息的好问题是 here.