U-SQL - 从复杂的嵌套 json 文件中提取数据

U-SQL - Extract data from complex nested json file

我的 json 结构如下:

{
"First":"xxxx",
"Country":"XX",
"Loop": {
    "Links": [
        {
            "Url":"xxxx",
            "Time":123
        }, {
            "Url":"xxxx",
            "Time":123
        }],
    "TotalTime":123,
    "Date":"2018-04-09T10:29:39.0233082+00:00"
}

我想提取属性

First
Country
Url & Time foreach object in the array
TotalTime
Date

这是我的查询

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 

@extration = 
EXTRACT 
    jsonString string 
FROM @"/storage-api/input.json" 
USING Extractors.Tsv(quoting:false);

@cleanUp = SELECT jsonString FROM @extration WHERE (!jsonString.Contains("Part: h" ) AND jsonString!= "465}");

@jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS obj FROM @cleanUp;

@columnized = SELECT 
        obj["First"] AS first,
        obj["Country"] AS country
FROM @jsonify;

OUTPUT @columnized
TO @"/storage-api/outputs/tpe1-output.csv"
USING Outputters.Csv();

但是这个查询只提取了前2个属性,我不知道如何查询里面的嵌套数据"Loop"

您可以使用 MultiLevelJsonExtractor(注释 here)和 JSON 路径(例如 Loop.Links[*])来做到这一点。 MultiLevelJsonExtractor 有一个很好的功能,如果在你的基本路径中找不到你的节点,它会递归地检查它,虽然我不确定性能如何在大型 JSON 文档或大量 JSON 个文档。

试试这个:

DECLARE @input string = "/input/input65.json";

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 

USING Microsoft.Analytics.Samples.Formats.Json;

@result =
    EXTRACT First string,
            Country string,
            Date DateTime,
            Url string,
            Time string,
            TotalTime int
    FROM @input
    USING new MultiLevelJsonExtractor("Loop.Links[*]",

          false,
          "First",
          "Country",
          "Date",
          "Url",
          "Time",
          "TotalTime"
          );


OUTPUT @result
TO "/output/output.csv"
USING Outputters.Csv();

我的结果:

HTH