Talend 以一种奇怪的格式提取 JSON

Talend extract JSON with a strange format

我在使用 Talend 时遇到问题:我必须提取一种非常奇怪的 JSON 格式,它看起来像:

{"results":[{"id":0,"series":[{"name":"table1","columns":["column1","column2","column3","column4"],"values":[["Value1","Value2","Value3","Value4"],["Value1","Value2","Value3","Value4"],["Value1","Value2","Value3","Value4"],["Value1","Value2","Value3","Value4"],["Value1","Value2","Value3","Value4"]]}]}]}

实际上,在“series”对象中我们有“columns”对象,其中包含各列的名称,“values”对象包含各行的值。 所需的输出将是具有更正常格式的 table/csv/json,因此字段和值。 有谁知道我该怎么做?到目前为止,我已经尝试提取各种 JSON 字段,但输出如下:

Columns
Column1
Column2
Column3
Column4
values
["Value1","Value2","Value3","Value4"]
["Value1","Value2","Value3","Value4"]
["Value1","Value2","Value3","Value4"]
["Value1","Value2","Value3","Value4"]

(对于这个,我想我可能必须提取另一个 JSON 字段)。

感谢大家

PS。我在 post

中添加了 Talend

您没有指定任何语言,所以我想任何语言都可以玩?这个PHP脚本

<?php

$js=<<<'JS'
{
    "results": [{
        "id": 0,
        "series": [{
            "name": "table1",
            "columns": ["column1", "column2", "column3", "column4"],
            "values": [
                ["Value1", "Value2", "Value3", "Value4"],
                ["Value1", "Value2", "Value3", "Value4"],
                ["Value1", "Value2", "Value3", "Value4"],
                ["Value1", "Value2", "Value3", "Value4"],
                ["Value1", "Value2", "Value3", "Value4"]
            ]
        }]
    }]
}
JS;
$data=json_decode($js,true);
$extracted=array();
foreach($data['results'] as $result){
    foreach($result['series'] as $serie){
        foreach($serie['values'] as $values){
            $extract=[];
            foreach($values as $valueKey=>$value){
                $extract[$serie["columns"][$valueKey]]=$value;
            }
            $extracted[]=$extract;
        }
    }
}
echo json_encode($extracted,JSON_PRETTY_PRINT);

产出


[
    {
        "column1": "Value1",
        "column2": "Value2",
        "column3": "Value3",
        "column4": "Value4"
    },
    {
        "column1": "Value1",
        "column2": "Value2",
        "column3": "Value3",
        "column4": "Value4"
    },
    {
        "column1": "Value1",
        "column2": "Value2",
        "column3": "Value3",
        "column4": "Value4"
    },
    {
        "column1": "Value1",
        "column2": "Value2",
        "column3": "Value3",
        "column4": "Value4"
    },
    {
        "column1": "Value1",
        "column2": "Value2",
        "column3": "Value3",
        "column4": "Value4"
    }
]

这是将结果作为 csv 文件获取的解决方案。

我使用 tFixedFlowInput_1 和 tFixedFlowInput_3 作为您示例中 json 的输入。
tExtractJSONFields_1 从列数组中提取各个列,然后将其非规范化到一个文件中。

tExtractJSONFields_2 将值提取为数组,然后对于每个值,我们使用 tExtractJSONFields_3 提取单个值,并且我们对每组值进行非规范化以获得 [=34= 中的 csv 行](以附加模式写入上一个文件)。

最终结果如下所示:

column1,column2,column3,column4
Value1,Value2,Value3,Value4
Value1,Value2,Value3,Value4
Value1,Value2,Value3,Value4
Value1,Value2,Value3,Value4
Value1,Value2,Value3,Value4

我用逗号作为分隔符,可以在tDenormalize_1和tDenormalize_2

中更改