Jolt 引用数组中的第一个元素作为目标名称
Jolt reference first element in array as target name
我已经研究了几周(在后台),并且对如何使用 NiFi JoltTransformJson 处理器将接近 CSV 的 JSON 数据转换为标记集感到困惑。我的意思是将输入中数组第一行的数据用作输出中的 JSON 对象名称。
例如我有这个输入数据:
[
[
"Company",
"Retail Cost",
"Percentage"
],
[
"ABC",
"5,368.11",
"17.09%"
],
[
"DEF",
"101.47",
"0.32%"
],
[
"GHI",
"83.79",
"0.27%"
]
]
我想得到的输出是:
[
{
"Company": "ABC",
"Retail Cost": "5,368.11",
"Percentage": "17.09%"
},
{
"Company": "DEF",
"Retail Cost": "101.47",
"Percentage": "0.32%"
},
{
"Company": "GHI",
"Retail Cost": "83.79",
"Percentage": "0.27%"
}
]
我认为这主要是两个问题:访问第一个数组的内容,然后确保输出数据不包含第一个数组。
我很想 post 显示自己有点接近的 Jolt 规范,但最接近的给了我正确的输出形状,但没有正确的内容。它看起来像这样:
[
{
"operation": "shift",
"spec": {
"*": {
"*": "[&1].&0"
}
}
}
]
但它会产生如下输出:
[ {
"0" : "Company",
"1" : "Retail Cost",
"2" : "Percentage"
}, {
"0" : "ABC",
"1" : "5,368.11",
"2" : "17.09%"
}, {
"0" : "DEF",
"1" : "101.47",
"2" : "0.32%"
}, {
"0" : "GHI",
"1" : "83.79",
"2" : "0.27%"
} ]
显然对象名称错误,输出中的元素过多 1 个。
可以做到,但是哇,它很难阅读/看起来很糟糕的正则表达式
规格
[
{
// this does most of the work, but producs an output
// array with a null in the Zeroth space.
"operation": "shift",
"spec": {
// match the first item in the outer array and do
// nothing with it, because it is just "header" data
// e.g. "Company", "Retail Cost", "Percentage".
// we need to reference it, but not pass it thru
"0": null,
//
// loop over all the rest of the items in the outer array
"*": {
// this is rather confusing
// "*" means match the array indices of the innner array
// and we will write the value at that index "ABC" etc
// to "[&1].@(2,[0].[&])"
// "[&1]" means make the ouput be an array, and at index
// &1, which is the index of the outer array we are
// currently in.
// Then "lookup the key" (Company, Retail Cost) using
// @(2,[0].[&])
// Which is go back up the tree to the root, then
// come back down into the first item of the outer array
// and Index it by the by the array index of the current
// inner array that we are at.
"*": "[&1].@(2,[0].[&])"
}
}
},
{
// We know the first item in the array will be null / junk,
// because the first item in the input array was "header" info.
// So we match the first item, and then accumulate everything
// into a new array
"operation": "shift",
"spec": {
"0": null,
"*": "[]"
}
}
]
我已经研究了几周(在后台),并且对如何使用 NiFi JoltTransformJson 处理器将接近 CSV 的 JSON 数据转换为标记集感到困惑。我的意思是将输入中数组第一行的数据用作输出中的 JSON 对象名称。
例如我有这个输入数据:
[
[
"Company",
"Retail Cost",
"Percentage"
],
[
"ABC",
"5,368.11",
"17.09%"
],
[
"DEF",
"101.47",
"0.32%"
],
[
"GHI",
"83.79",
"0.27%"
]
]
我想得到的输出是:
[
{
"Company": "ABC",
"Retail Cost": "5,368.11",
"Percentage": "17.09%"
},
{
"Company": "DEF",
"Retail Cost": "101.47",
"Percentage": "0.32%"
},
{
"Company": "GHI",
"Retail Cost": "83.79",
"Percentage": "0.27%"
}
]
我认为这主要是两个问题:访问第一个数组的内容,然后确保输出数据不包含第一个数组。
我很想 post 显示自己有点接近的 Jolt 规范,但最接近的给了我正确的输出形状,但没有正确的内容。它看起来像这样:
[
{
"operation": "shift",
"spec": {
"*": {
"*": "[&1].&0"
}
}
}
]
但它会产生如下输出:
[ {
"0" : "Company",
"1" : "Retail Cost",
"2" : "Percentage"
}, {
"0" : "ABC",
"1" : "5,368.11",
"2" : "17.09%"
}, {
"0" : "DEF",
"1" : "101.47",
"2" : "0.32%"
}, {
"0" : "GHI",
"1" : "83.79",
"2" : "0.27%"
} ]
显然对象名称错误,输出中的元素过多 1 个。
可以做到,但是哇,它很难阅读/看起来很糟糕的正则表达式
规格
[
{
// this does most of the work, but producs an output
// array with a null in the Zeroth space.
"operation": "shift",
"spec": {
// match the first item in the outer array and do
// nothing with it, because it is just "header" data
// e.g. "Company", "Retail Cost", "Percentage".
// we need to reference it, but not pass it thru
"0": null,
//
// loop over all the rest of the items in the outer array
"*": {
// this is rather confusing
// "*" means match the array indices of the innner array
// and we will write the value at that index "ABC" etc
// to "[&1].@(2,[0].[&])"
// "[&1]" means make the ouput be an array, and at index
// &1, which is the index of the outer array we are
// currently in.
// Then "lookup the key" (Company, Retail Cost) using
// @(2,[0].[&])
// Which is go back up the tree to the root, then
// come back down into the first item of the outer array
// and Index it by the by the array index of the current
// inner array that we are at.
"*": "[&1].@(2,[0].[&])"
}
}
},
{
// We know the first item in the array will be null / junk,
// because the first item in the input array was "header" info.
// So we match the first item, and then accumulate everything
// into a new array
"operation": "shift",
"spec": {
"0": null,
"*": "[]"
}
}
]