使用 Apache Cassandra 和 Elasticsearch 时如何在 Apache Atlas 中保存沿袭信息

Question

我计划使用 Apache Cassandra 作为存储后端，使用 Elasticsearch 作为索引后端来部署 Apache Atlas。我想知道如何用这个保存谱系信息？提供了getAPI获取血统信息，但是好像没办法保存。

Answer 1

在 Atlas 中，谱系是在使用输入和输出通过流程链接时创建的。

示例：如果你想看到两个 hive_table 类型之间的谱系，它会是这样的：

T1(hive_table)--->P1(hive_process)--->T2(hive_table)

因此，基本上实体需要通过流程类型进行链接。

在 Atlas 中，流程是实体，可以使用 API POST: /v2/entity 创建，并在其中定义输入和输出，如上 hive_process:

POST: /api/atlas/v2/entity
    {
      "entity": {
        "typeName": "hive_process",
        "attributes": {
          "outputs": [
            {
              "guid": "2", 
              "typeName": "hive_table",
              "uniqueAttributes": {
                "qualifiedName": "t2@primary"
              }
            }
          ],
          "qualifiedName": "p1@primary",
          "inputs": [
            {
              "guid": "1",
              "typeName": "hive_table",
              "uniqueAttributes": {
                "qualifiedName": "t1@primary"
              }
            }
          ],
          "name": "P1-Process"
        }
      }
    }

创建流程之前要注意的重要事项是引用的实体（输入、输出）应该预先存在，否则流程创建将失败。

如果您的要求不包含预先存在的类型，您当然可以继续为 Atlas 实体和流程定义您自己的类型

更多关于 Apache site

上的 Atlas 类型系统

使用 Apache Cassandra 和 Elasticsearch 时如何在 Apache Atlas 中保存沿袭信息

How do I save lineage info in Apache Atlas when using Apache Cassandra and Elasticsearch

cassandra

apache-atlas