如何加速 Titan DB 中的 "global" 查询？

Question

我们使用带有 Persistit 的 Titan 作为后端，用于具有大约 100.000 个顶点的图。我们的用例相当复杂，但当前的问题可以用一个简单的例子来说明。假设我们在图中存储 Books 和 Authors。每个 Book 顶点都有一个 ISBN 号，这对整个图来说是唯一的。

我需要回答以下问题： 给我图表中所有书籍的 ISBN 号集。

目前我们是这样操作的：

// retrieve graph instance
TitanGraph graph = getGraph(); 
// Start a Gremlin query (I omit the generics for brevity here)
GremlinPipeline gremlin = new GremlinPipeline().start(graph);
// get all vertices in the graph which represent books (we have author vertices, too!)
gremlin.V("type", "BOOK");
// the ISBN numbers are unique, so we use a Set here
Set<String> isbnNumbers = new HashSet<String>();
// iterate over the gremlin result and retrieve the vertex property
while(gremlin.hasNext()){
    Vertex v = gremlin.next();
    isbnNumbers.add(v.getProperty("ISBN"));
}
return isbnNumbers;

我的问题是：有没有更聪明的方法可以更快地完成此操作？我是 Gremlin 的新手，所以很可能我在这里做了一些非常愚蠢的事情。查询目前需要 2.5 秒，这还算不错，但如果可能的话，我想加快速度。请认为后端已修复。

Answer 1

我怀疑是否有更快的方法（您将始终需要遍历所有书的顶点），但是 groovy/gremlin 可以为您的任务提供更简洁的解决方案。在 sample graph 上，您可以运行例如以下查询：

gremlin> namesOfJaveProjs = []; g.V('lang','java').name.store(namesOfJaveProjs)
gremlin> namesOfJaveProjs
==>lop
==>ripple

或者你的书图：

isbnNumbers = []; g.V('type','BOOK').ISBN.store(isbnNumbers)

如何加速 Titan DB 中的 "global" 查询？

How to speed up "global" queries in Titan DB?

gremlin

titan

tinkerpop