gremlin 查询加载选定列的 csv 文件

Question

我在 gremlin 中使用以下脚本通过 csv 文件创建图表：

graph = TinkerGraph.open()
graph.createIndex('userId', Vertex.class) //(1)
g = graph.traversal()
getOrCreate = { id ->
 g.V().has('userId', id).tryNext().orElseGet{ g.addV('userId', id).next() }
}
 new File('wiki-Vote.txt').eachLine 
 { 
 if (!it.startsWith("#")){ 
 l->p=it.split(',').collect(getOrCreate) //(2)**
 (fromVertex, toVertex) = (s[0],s[1])
   fromVertex.addEdge('votesFor', toVertex) } }

正如我们在此查询中所见，请参见第

行

l>p=it.split(',').collect(getOrCreate)

在这一行中，csv 文件行根据分隔符“,”进行拆分，然后调用 getOrCreate 方法函数对收集的顶点应用索引。

如果我给出 g.V().count() 它正在计算所有列中的所有值。但我只需要将选定的列添加到顶点中。

我需要的： 我只想在选定的列上应用 getOrCreate 方法，而不是在所有列上应用

例如：如果 csv 文件有 name、age、Id、marks 列。我只想在名称和年龄列上应用 getOrCreate 方法并将它们添加到顶点中。如果我给 g.V().count()... 它必须只给我姓名和年龄计数

Answer 1

您提供的示例与 Powers of Ten 博客 post 中有关批量加载的示例相似。博客 post 介绍了一些 over-simplification 的 CSV 加载概念，以传达简单的 Groovy 脚本是加载小图形的最佳方式这一点。该逻辑也与 wikivote 数据紧密相关，wikivote 数据是一个只有用户标识符的边缘列表。

如果您有一组更复杂的加载逻辑或 CSV 文件包含的列多于您想要加载的列，那么您需要扩展博客中提供的起点 post .如何执行此操作取决于 CSV 文件的结构。假设它仍然像 wikivote 数据一样只是一个边列表，但你只是在边列表中有更多的边顶点对列：

getOrCreate = { id,name,age ->
  def p = g.V('userId', id)
  if (p.hasNext()) ? p.next() : g.addVertex([userId:id, userName:name, userAge:age])
}

new File('wiki-Vote.txt').eachLine {
  if (!it.startsWith("#")){
    def row = it.split('\t')
    def fromVertex = getOrCreate(row[0],row[1],row[3])
    def toVertex = getOrCreate(row[5],row[6],row[8])
    fromVertex.addEdge('votesFor', toVertex)
  }
}

g.commit()

因此，我们不需要 Groovy 将 CSV 文件的一行分解为顶点，而是将行拆分为列列表。然后我们将 "fromVertex" 和 "toVertex" 的 getOrCreate 称为我们需要的列（我假设了您的数据是如何构建的，所以希望您明白我是能够忽略此代码中的某些列）。如果您的 CSV 文件非常复杂，您可能需要考虑从 groovycsv 获得一些帮助，这是一个非常好的解析库，可以帮助稍微简化您的代码。

请注意，此代码（和博客 post）基于 TinkerPop 2.x 和 Titan 0.5.x 的代码。显然，如果需要，"addVertex" 的 Gremlin 语法必须针对 TinkerPop 3.x 进行调整。

gremlin 查询加载选定列的 csv 文件

gremlin query to load csv file with selected column

csv

groovy

gremlin

titan