如何处理分组规则

Question

我正在尝试在我分配的 Cosmos space 中保持一些秩序。目前我正在存储数据，如下图所示：

.../webhdfs/v1/user/[ USERNAME ]/[ Fiware-Service ]/[ Fiware-ServicePath ]/TEMPORAL1_PhysicalTest/TEMPORAL1_PhysicalTest.txt
.../webhdfs/v1/user/[ USERNAME ]/[ Fiware-Service ]/[ Fiware-ServicePath ]/TEMPORAL2_PhysicalTest/TEMPORAL2_PhysicalTest.txt
.../webhdfs/v1/user/[ USERNAME ]/[ Fiware-Service ]/[ Fiware-ServicePath ]/TEMPORAL3_PhysicalTest/TEMPORAL3_PhysicalTest.txt
.../webhdfs/v1/user/[ USERNAME ]/[ Fiware-Service ]/[ Fiware-ServicePath ]/TEMPORAL4_PhysicalTest/TEMPORAL4_PhysicalTest.txt

其中 TEMPORAL1 代表我的实体 ID 和 PhysicalTest 各自的类型。但是，我想知道基于以下（假设的）结构存储数据的适当机制：

.../webhdfs/v1/user/[ USERNAME ]/[ Fiware-Service ]/[ Fiware-ServicePath ]/physicaltests/TEMPORAL1_PhysicalTest.txt
.../webhdfs/v1/user/[ USERNAME ]/[ Fiware-Service ]/[ Fiware-ServicePath ]/physicaltests/TEMPORAL2_PhysicalTest.txt
.../webhdfs/v1/user/[ USERNAME ]/[ Fiware-Service ]/[ Fiware-ServicePath ]/physicaltests/TEMPORAL3_PhysicalTest.txt
.../webhdfs/v1/user/[ USERNAME ]/[ Fiware-Service ]/[ Fiware-ServicePath ]/physicaltests/TEMPORAL4_PhysicalTest.txt

我相信可以通过分组规则来解决；虽然不确定。

如果是这样的话，我已经按照下面的方式解决了我的 grouping_rules.conf 但没有成功的结果，因为我最终得到了首先呈现的结构：

{
    "grouping_rules": [
        {
            "id": 1,
            "fields": [
                "entityType"
            ],
            "regex": "PhysicalTest.*",
            "destination": "PhysicalTest",
            "fiware_service_path": "/[ Fiware-Service ]/physicaltests"
        }
    ]
}

Answer 1

这种事不能做。 Cygnus 按照以下模式 (*):

将数据存储在 HDFS 文件夹中

/user/<username>/<service>/<service-path>/<entity-id>_<entity-type>/<entity-id>_<entity-type>.txt

<entity-id>_<entity-type>/<entity-id>_<entity-type>.txt 部分的结构无法更改，在某种意义上始终是（通知或映射 - 稍后解释 -）实体 ID 和（通知或映射 - 稍后解释 -）实体type 将用于组合它。请观察这样的结构在子文件夹和文件中复制实体 ID 和类型串联。为什么？因为 Hadoop 使用的是目录，而不是文件。因此，为了能够进行单实体分析，在Cygnus中设计了这样的结构。

也就是说，可以使用 Name Mappings 更改上述结构，该功能允许您修改实体 ID and/or 实体类型（以及其他）。这是一个非常强大的功能，因为您可以说，例如 "all the entities of type car will see their IDs mapped to a single ID of my choice"，这意味着所有实体都将存储在相同的 subdirectory/file:

中

/user/<username>/<service>/<service-path>/<unique-entity-id>_<entity-type>/<unique-entity-id>_<entity-type>.txt

我猜这是最接近您需要的了。

你提到的 Grouping Rules 呢？它们是名称映射之前的东西。他们允许我们修改实体 ID 和类型的整个连接（我们称之为 "destination"），但解释的结构也保持不变：

/user/<username>/<service>/<service-path>/<destination>/<destination>.txt

分组规则 deprecated 有利于名称映射。

(*) 或者，如果配置 service_as_namespace = true，则可以避免 <username> 级别。如果您的 FIWARE 服务与有效的 HDFS 用户匹配，这将很有用：

/user/<service>/<service-path>/<entity-id>_<entity-type>/<entity-id>_<entity-type>.txt

如何处理分组规则

How to handle grouping rules

fiware-cygnus