Jolt 引用数组中的第一个元素作为目标名称

Question

我已经研究了几周（在后台），并且对如何使用 NiFi JoltTransformJson 处理器将接近 CSV 的 JSON 数据转换为标记集感到困惑。我的意思是将输入中数组第一行的数据用作输出中的 JSON 对象名称。

例如我有这个输入数据：

[
  [
    "Company",
    "Retail Cost",
    "Percentage"
  ],
  [
    "ABC",
    "5,368.11",
    "17.09%"
  ],
  [
    "DEF",
    "101.47",
    "0.32%"
  ],
  [
    "GHI",
    "83.79",
    "0.27%"
  ]
]

我想得到的输出是：

[
  {
    "Company": "ABC",
    "Retail Cost": "5,368.11",
    "Percentage": "17.09%"
  },
  {
    "Company": "DEF",
    "Retail Cost": "101.47",
    "Percentage": "0.32%"
  },
  {
    "Company": "GHI",
    "Retail Cost": "83.79",
    "Percentage": "0.27%"
  }
]

我认为这主要是两个问题：访问第一个数组的内容，然后确保输出数据不包含第一个数组。

我很想 post 显示自己有点接近的 Jolt 规范，但最接近的给了我正确的输出形状，但没有正确的内容。它看起来像这样：

[
  {
    "operation": "shift",
    "spec": {
      "*": {
        "*": "[&1].&0"
      }
    }
  }
]

但它会产生如下输出：

[ {
  "0" : "Company",
  "1" : "Retail Cost",
  "2" : "Percentage"
}, {
  "0" : "ABC",
  "1" : "5,368.11",
  "2" : "17.09%"
}, {
  "0" : "DEF",
  "1" : "101.47",
  "2" : "0.32%"
}, {
  "0" : "GHI",
  "1" : "83.79",
  "2" : "0.27%"
} ]

显然对象名称错误，输出中的元素过多 1 个。

Answer 1

可以做到，但是哇，它很难阅读/看起来很糟糕的正则表达式

规格

[
  {
    // this does most of the work, but producs an output
    //  array with a null in the Zeroth space.
    "operation": "shift",
    "spec": {
      // match the first item in the outer array and do 
      //  nothing with it, because it is just "header" data
      //   e.g. "Company", "Retail Cost", "Percentage".
      // we need to reference it, but not pass it thru
      "0": null,
      // 
      // loop over all the rest of the items in the outer array
      "*": {
        // this is rather confusing
        // "*" means match the array indices of the innner array
        // and we will write the value at that index "ABC" etc
        // to "[&1].@(2,[0].[&])"
        // "[&1]" means make the ouput be an array, and at index
        //   &1, which is the index of the outer array we are
        //   currently in.
        // Then "lookup the key" (Company, Retail Cost) using
        //  @(2,[0].[&])
        // Which is go back up the tree to the root, then 
        //  come back down into the first item of the outer array
        //  and Index it by the by the array index of the current
        //  inner array that we are at.
        "*": "[&1].@(2,[0].[&])"
      }
    }
  },
  {
    // We know the first item in the array will be null / junk,
    //  because the first item in the input array was "header" info.
    // So we match the first item, and then accumulate everything
    //  into a new array
    "operation": "shift",
    "spec": {
      "0": null,
      "*": "[]"
    }
  }
]

Jolt 引用数组中的第一个元素作为目标名称

Jolt reference first element in array as target name

json

jolt

apache-nifi