使用灯泡将数据加载到 Titan 中，然后访问它

Question

我是图形数据库和所有 Titan 生态系统的新手，所以请原谅我听起来很愚蠢。我也为缺乏文档而苦恼 -_-

我已经安装了泰坦服务器。我正在使用 Cassandra 作为后端。

我正在尝试使用 Python 将基本的 Twitter 数据加载到 Titan 中。为此，我使用 bulbs 库。比方说，我在朋友列表

中有一个我在 Twitter 上关注的人的列表

我的 python 脚本是这样的

from bulbs.titan import Graph
# some other imports here

# getting the *friends* list for a specified user here


g = Graph()

# a vertex of a specified user
center = g.vertices.create(name = 'sergiikhomenko')


for friend in friends:
    cur_friend = g.vertices.create(name = friend)
    g.edges.create(center,'follows',cur_friend)

据我了解 - 上面的代码应该在 Titan 中创建了一个具有多个顶点的图形，其中一些顶点由连接边.

我的问题是：

How do I save it in Titan?? (like a commit in SQL)

How do I access it later?? Should I be able to access it through gremlin shell?? If yes, how??

我的下一个问题是关于可视化数据，但我离那里还很远:)

请帮忙 :) 我完全迷失在所有这些 Titan、Gremlin、Rexster 等中。 :)

更新： 我们 POC 项目的要求之一是...... python :)，这就是为什么我直接跳进灯泡。不过，我一定会遵循以下建议:)

Answer 1

我的回答会有些不完整，因为我无法真正提供有关灯泡的答案，但您确实提出了一些我可以尝试回答的具体问题：

How do I save it in Titan?? (like a commit in SQL)

只是 Java/Groovy 中的 g.commit()。

How do I access it later?? Should I be able to access it through gremlin shell?? If yes, how??

一旦将其提交给 cassandra，就可以使用 Bulbs、gremlin shell、其他一些应用程序访问它。不确定你到底在问什么，但我喜欢 Gremlin 控制台来处理这些事情，所以如果 cassandra 在本地启动，启动 bin/gremlin.sh 并执行：

g = TitanFactory.build()
    .set("storage.backend","cassandra")
    .set("storage.hostname","127.0.0.1")
    .open();

这将使您连接到 cassandra，您应该能够查询您的数据。

I am completely lost in all this Titan, Gremlin, Rexster,etc

我对所有新用户（尤其是图形、cassandra、jvm 等新用户）的建议是放慢速度。气馁的最快方法是尝试对灯泡、rexster、gremlin、泰坦、EC2 中托管的 cassandra 集群和 hadoop 进行 python - 并尝试将十亿边图加载到其中。

如果您是新手，请从最新的东西开始：TinkerPop3 - http://tinkerpop.incubator.apache.org/ - 哪些灯泡尚不支持 - 但没关系，因为你正在学习 TinkerPop，这对于学习整个堆栈很重要，并且TinkerPop 的所有实现（例如 Titan）。将 TinkerGraph（而非 Titan）与您的一小部分数据一起使用，并确保在尝试全面扩展之前获得加载该小部分数据的模式。将 Gremlin 控制台用于与此初始目标相关的所有内容。这是轻松取胜的秘诀。在这种方法下，您可能会在一天内使用图表对您自己的数据进行一些查询，并了解您需要使用 Titan 完成的大部分工作。

获得图表后，让它在 Gremlin Server（TP3 的 Rexster 替代品）中运行。然后考虑如何通过 python 工具访问它。或者，也许您想出了如何将 TinkerGraph 转换为 Titan（也许从 BerkeleyDB 而不是 cassandra 开始）。我在这里的观点是更缓慢地增加你对生态系统不同部分的参与，否则它会让人不知所措。

使用灯泡将数据加载到 Titan 中，然后访问它

Loading data into Titan with bulbs and then accessing it

python

gremlin

titan

bulbs

rexster