U-SQL - 从复杂的嵌套 json 文件中提取数据
U-SQL - Extract data from complex nested json file
我的 json 结构如下:
{
"First":"xxxx",
"Country":"XX",
"Loop": {
"Links": [
{
"Url":"xxxx",
"Time":123
}, {
"Url":"xxxx",
"Time":123
}],
"TotalTime":123,
"Date":"2018-04-09T10:29:39.0233082+00:00"
}
我想提取属性
First
Country
Url & Time foreach object in the array
TotalTime
Date
这是我的查询
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
@extration =
EXTRACT
jsonString string
FROM @"/storage-api/input.json"
USING Extractors.Tsv(quoting:false);
@cleanUp = SELECT jsonString FROM @extration WHERE (!jsonString.Contains("Part: h" ) AND jsonString!= "465}");
@jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS obj FROM @cleanUp;
@columnized = SELECT
obj["First"] AS first,
obj["Country"] AS country
FROM @jsonify;
OUTPUT @columnized
TO @"/storage-api/outputs/tpe1-output.csv"
USING Outputters.Csv();
但是这个查询只提取了前2个属性,我不知道如何查询里面的嵌套数据"Loop"
您可以使用 MultiLevelJsonExtractor
(注释 here)和 JSON 路径(例如 Loop.Links[*]
)来做到这一点。 MultiLevelJsonExtractor
有一个很好的功能,如果在你的基本路径中找不到你的节点,它会递归地检查它,虽然我不确定性能如何在大型 JSON 文档或大量 JSON 个文档。
试试这个:
DECLARE @input string = "/input/input65.json";
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
@result =
EXTRACT First string,
Country string,
Date DateTime,
Url string,
Time string,
TotalTime int
FROM @input
USING new MultiLevelJsonExtractor("Loop.Links[*]",
false,
"First",
"Country",
"Date",
"Url",
"Time",
"TotalTime"
);
OUTPUT @result
TO "/output/output.csv"
USING Outputters.Csv();
我的结果:
HTH
我的 json 结构如下:
{
"First":"xxxx",
"Country":"XX",
"Loop": {
"Links": [
{
"Url":"xxxx",
"Time":123
}, {
"Url":"xxxx",
"Time":123
}],
"TotalTime":123,
"Date":"2018-04-09T10:29:39.0233082+00:00"
}
我想提取属性
First
Country
Url & Time foreach object in the array
TotalTime
Date
这是我的查询
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
@extration =
EXTRACT
jsonString string
FROM @"/storage-api/input.json"
USING Extractors.Tsv(quoting:false);
@cleanUp = SELECT jsonString FROM @extration WHERE (!jsonString.Contains("Part: h" ) AND jsonString!= "465}");
@jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS obj FROM @cleanUp;
@columnized = SELECT
obj["First"] AS first,
obj["Country"] AS country
FROM @jsonify;
OUTPUT @columnized
TO @"/storage-api/outputs/tpe1-output.csv"
USING Outputters.Csv();
但是这个查询只提取了前2个属性,我不知道如何查询里面的嵌套数据"Loop"
您可以使用 MultiLevelJsonExtractor
(注释 here)和 JSON 路径(例如 Loop.Links[*]
)来做到这一点。 MultiLevelJsonExtractor
有一个很好的功能,如果在你的基本路径中找不到你的节点,它会递归地检查它,虽然我不确定性能如何在大型 JSON 文档或大量 JSON 个文档。
试试这个:
DECLARE @input string = "/input/input65.json";
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
@result =
EXTRACT First string,
Country string,
Date DateTime,
Url string,
Time string,
TotalTime int
FROM @input
USING new MultiLevelJsonExtractor("Loop.Links[*]",
false,
"First",
"Country",
"Date",
"Url",
"Time",
"TotalTime"
);
OUTPUT @result
TO "/output/output.csv"
USING Outputters.Csv();
我的结果:
HTH