在实例之间迁移 Janusgraph 中的图

Migrate Graphs in Janusgraph Between Instances

我的团队正在研究在实例之间迁移 Janusgraph 数据（我们在 Google Cloud BigTable 之上使用 Janusgraph），使用两种不同的方法：

将图形导出为 graphml 文件，并将其导入另一个实例
导出底层 BigTable table，并将其导入 table 底层另一个实例

但是，对于每种方法，我们都面临着问题：

我们的图表非常庞大，在导出过程中，我们一直面临 java.io.IOException: Connection reset by peer 问题，即使在将 gremlin 服务器超时设置为超过 20 分钟后也是如此
我们尝试通过 Cloud Dataflow 以 3 种不同的格式导出 BigTable table（建议 here），所有这些都面临不同的问题：
- Avro 格式：导出 avro 文件后，将它们重新导入到新的 table 时，我们会遇到以下错误：Error message from worker: java.io.IOException: At least 8 errors occurred writing to Bigtable. First 8 errors: Error mutating row ( ;�! with mutations [set cell ....] .... Caused by: java.lang.NullPointerException - 因为 Janusgraph 将二进制数据存储到 BigTable，可能数据流作业无法正确导出 avro 文件
- SequenceFile 格式：重新导入这些文件时，我们遇到以下错误：Error message from worker: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 310 actions: StatusRuntimeException: 310 times, servers with issues: batch-bigtable.googleapis.com
- Parquet 格式：这被证明是最有希望的，并且导入工作基本完成（除了 dataflow worker Root cause: The worker lost contact with the service. 缩减规模期间看到的错误）。重新导入到目标 table 时，数据通常完好无损。但是，索引在导入后似乎“古怪”（例如，在索引属性上使用 has() 过滤器查询特定节点时，查询很快完成，但 return任何结果）

如有任何关于上述问题的opinions/inputs，谢谢！

所以这里的问题似乎是 Dataflow 失败了每行超过 100k 个突变的突变请求（由于 BigTable 的限制）。然而，较新版本的 ParquetToBigTable template provided by Google 似乎有一个名为“splitLargeRows”的新参数，它有助于拆分大行，以便突变数量保持 <= 100k。

在实例之间迁移 Janusgraph 中的图

Migrate Graphs in Janusgraph Between Instances

google-cloud-bigtable

janusgraph