从 Boost 图中删除 100,000 多个节点

Question

我有一个图表 ( adjacency_list (listS, vecS, bidirectionalS, VertexVal) )，我需要在其中删除超过 100,000 个节点。每个节点还包含一个由 2 个 64 位整数和另一个 64 位整数组成的结构。下面代码中发生的 guid 检查正在检查结构中的第一个整数。

根据 VTune，在我的笔记本电脑（i7 2.7GHz，16GB RAM）上大约需要 88 秒。

以下是我删除节点的方法：

  vertex_iterator vi,vi_end;
  boost::tie(vi, vi_end) = boost::vertices(m_graph);
  while (vi!=vi_end) {
    if (m_graph[*vi].guid.part1 == 0) {
      boost::remove_vertex(*vi,m_graph);
      boost::tie(vi, vi_end) = boost::vertices(m_graph);
    } else 
      ++vi;
  }

Vtune 显示 boost::remove_vertex() 调用需要 88.145 秒。有没有更有效的方法来删除这些顶点？

Answer 1

看到更多 Vtune 数据会很有趣。

我的经验是，在删除数万个小对象时，默认的 Microsoft 分配器可能是一个很大的瓶颈。您的 Vtune 图表是否显示大量时间处于删除或空闲状态？

如果是这样，请考虑切换到第三方分配器。 Nedmalloc据说不错：http://www.nedprod.com/programs/portable/nedmalloc/

~~Google 有一个，tcmalloc，它得到了很好的评价，并且比几乎所有平台上的内置分配器都快得多。 https://code.google.com/p/gperftools/~~ tcmalloc 不是 Windows 的替代品。

Answer 2

在您的删除分支中，您重新tie() 迭代器：

boost::tie(vi, vi_end) = boost::vertices(m_graph);

这将导致每次重新启动循环时循环重新启动。这正是 Schlemiel The Painter.

我会查明您是否可以相信 remove_vertex 不会触发重新分配。如果是这样，它很容易修复。否则，您需要基于索引器的循环而不是基于迭代器的循环。或者您可以在原始容器上工作（不过，我记得它是私有成员）。

Update 使用 vecS 作为顶点的容器会导致性能不佳：

If the VertexList template parameter of the adjacency_list was vecS, then all vertex descriptors, edge descriptors, and iterators for the graph are invalidated by this operation. <...> If you need to make frequent use of the remove_vertex() function the listS selector is a much better choice for the VertexList template parameter.

这个small benchmark test.cpp比较：

和-DSTABLE_IT (listS)

$ ./stable 
Generated 100000 vertices and 5000 edges in 14954ms
The graph has a cycle? false
starting selective removal...
Done in 0ms
After: 99032 vertices and 4916 edges

没有-DSTABLE_IT (vecS)

$ ./unstable 
Generated 100000 vertices and 5000 edges in 76ms
The graph has a cycle? false
starting selective removal...
Done in 396ms
After: 99032 vertices and 4916 edges

使用filtered_graph（感谢评论中的@cv_and_he）

Generated 100000 vertices and 5000 edges in 15ms
The graph has a cycle? false
starting selective removal...
Done in 0ms
After: 99032 vertices and 4916 edges
Done in 13ms

您可以清楚地看到，对于 listS，删除很多，但生成很多慢一些。

Answer 3

我能够使用 Boost 序列化例程将图形成功序列化为一个字符串，解析该字符串并删除我不需要的节点，并对修改后的字符串进行反序列化。对于图中总共 200,000 个节点和需要删除的 100,000 个节点，我能够在不到 2 秒的时间内成功完成操作。

对于我的特定用例，每个顶点都有 3 个 64 位整数。当需要删除它时，我将这些整数中的 2 个标记为 0。一个有效的顶点永远不会有 0。当要清理图形时 - 删除 "deleted" 个顶点，我遵循上面的逻辑。

在下面的代码中，removeDeletedNodes() 执行字符串解析和移除顶点并映射边数。

从 Boost 图中删除 100,000 多个节点

Remove 100,000+ nodes from a Boost graph

c++

boost