USQL 调用不带键读取 json 数组
USQL call to read json array without a key
我有一个嵌套的 json,我正试图在 usql 中展平它。我无法共享数据,但结构与此类似。
{
"userlist": [user1, user1],
"objects": {
"largeobjects": [object1, object2, object3]
"smallobjects": [s_object1, s_object2]
},
"applications": [{
"application": sdq3ds5dsa
}, {
"application": dksah122j4
}, {
"application": sadsw2dq2s
}, {
"application": pro3dfdsn3
}
],
"date" : 12344232,
"timezone" : "Asia",
"id" : "sad2ddssa2",
"admin": {
"lang": "eng",
"country": "us",
}
}
我正在使用自定义 json 输出器 (https://github.com/Azure/usql/tree/master/Examples/DataFormats/Microsoft.Analytics.Samples.Formats) 从 json 文件中提取,并使用 json 元组函数提取值。我的问题是该函数使用 sql 生成键值对的映射。这适用于我有键的情况,但是当我尝试使用该函数从无键数组中获取值时它会抛出错误。
任何有关如何解决此问题的建议都将不胜感激。
编辑
这是我正在寻找的输出:
sad2ddssa2, object1, 12344232, "Asia", "eng", "us",
sad2ddssa2, object2, 12344232, "Asia", "eng", "us"
第一个选项
尝试在您的 u-sql 中使用 PROSE。使用 PROSE 的 c# nuget 处理数据并进行复杂的提取。这是一个非常强大的 AI 包。在此处查看视频和示例:https://microsoft.github.io/prose
第二个选项
创建一个 c# 函数来处理您的 json。像这样的东西,使用 c# json api's:
使这个示例适应您的自定义提取请求
/* Formats the array of values into a named json array. */
DECLARE @JsonArray Func<SqlArray<string>, string, string> = (data, name) =>
{
StringBuilder buffer = new StringBuilder();
buffer.Append("{\r\n\t\"" + name + "\": [\r\n");
for (int i = 0; i < data.Count(); i++)
{
if (i > 0)
{
buffer.Append(",\r\n");
}
buffer.Append("\t\"" + data[i] + "\"");
}
buffer.Append("\r\n\t]\r\n}");
return buffer.ToString();
};
/* Format the array containing groups of comma separated values into a named json array */
@Query =
SELECT
@JsonArray(SubscriptionArray, "subscriptionList") AS JsonArray
FROM @subscriptionsQuery1;
第三个选项
尝试这种方法,根据您的需要进行调整后:
/* For each json line create a json map (SqlMap) */
@subscriptionsQuery1 =
SELECT
JsonFunctions.JsonTuple(JsonLine) AS JsonMap
FROM @SubscriptionsExtractor AS t;
/* For each json map get the required property value */
@subscriptionsQuery1 =
SELECT DISTINCT
JsonMap["alias"] AS Subscription
FROM @subscriptionsQuery1 AS t;
/* Join the value of all rows into a single row containing an array of all values */
@subscriptionsQuery1 =
SELECT
ARRAY_AGG<string>(Subscription) AS SubscriptionArray
FROM @subscriptionsQuery1 AS t;
我能够使用 NewtonSoft MultiLevelJsonExtractor 提取器和 this fixed-up JSON file:
使其工作
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
DECLARE @inputFile string = @"\input\yourInputJSON.json";
DECLARE @outputFile string = @"\output\output.csv";
@input =
EXTRACT id string,
largeobjects string,
date string,
timezone string,
lang string,
country string
FROM @inputFile
USING new MultiLevelJsonExtractor("objects", false,
"id",
"largeobjects",
"date",
"timezone",
"admin.lang",
"admin.country"
);
// Convert the JSON column to SQL MAP to multiple rows
@working =
SELECT id,
JsonFunctions.JsonTuple(largeobjects).Values AS largeobject,
date,
timezone,
lang,
country
FROM @input;
// Explode the JSON SQL MAP
@output =
SELECT id,
x.y AS largeobject,
date,
timezone,
lang,
country
FROM @working
CROSS APPLY
EXPLODE(largeobject) AS x(y);
OUTPUT @output
TO @outputFile
USING Outputters.Csv(quoting : false);
我的结果:
我想说这可能比使用自己动手的方法更安全,因为 NewtonSoft 库专门用于操作 JSON 并且已经过试验和测试。
我有一个嵌套的 json,我正试图在 usql 中展平它。我无法共享数据,但结构与此类似。
{
"userlist": [user1, user1],
"objects": {
"largeobjects": [object1, object2, object3]
"smallobjects": [s_object1, s_object2]
},
"applications": [{
"application": sdq3ds5dsa
}, {
"application": dksah122j4
}, {
"application": sadsw2dq2s
}, {
"application": pro3dfdsn3
}
],
"date" : 12344232,
"timezone" : "Asia",
"id" : "sad2ddssa2",
"admin": {
"lang": "eng",
"country": "us",
}
}
我正在使用自定义 json 输出器 (https://github.com/Azure/usql/tree/master/Examples/DataFormats/Microsoft.Analytics.Samples.Formats) 从 json 文件中提取,并使用 json 元组函数提取值。我的问题是该函数使用 sql 生成键值对的映射。这适用于我有键的情况,但是当我尝试使用该函数从无键数组中获取值时它会抛出错误。
任何有关如何解决此问题的建议都将不胜感激。
编辑 这是我正在寻找的输出:
sad2ddssa2, object1, 12344232, "Asia", "eng", "us",
sad2ddssa2, object2, 12344232, "Asia", "eng", "us"
第一个选项
尝试在您的 u-sql 中使用 PROSE。使用 PROSE 的 c# nuget 处理数据并进行复杂的提取。这是一个非常强大的 AI 包。在此处查看视频和示例:https://microsoft.github.io/prose
第二个选项
创建一个 c# 函数来处理您的 json。像这样的东西,使用 c# json api's:
使这个示例适应您的自定义提取请求/* Formats the array of values into a named json array. */
DECLARE @JsonArray Func<SqlArray<string>, string, string> = (data, name) =>
{
StringBuilder buffer = new StringBuilder();
buffer.Append("{\r\n\t\"" + name + "\": [\r\n");
for (int i = 0; i < data.Count(); i++)
{
if (i > 0)
{
buffer.Append(",\r\n");
}
buffer.Append("\t\"" + data[i] + "\"");
}
buffer.Append("\r\n\t]\r\n}");
return buffer.ToString();
};
/* Format the array containing groups of comma separated values into a named json array */
@Query =
SELECT
@JsonArray(SubscriptionArray, "subscriptionList") AS JsonArray
FROM @subscriptionsQuery1;
第三个选项
尝试这种方法,根据您的需要进行调整后:
/* For each json line create a json map (SqlMap) */
@subscriptionsQuery1 =
SELECT
JsonFunctions.JsonTuple(JsonLine) AS JsonMap
FROM @SubscriptionsExtractor AS t;
/* For each json map get the required property value */
@subscriptionsQuery1 =
SELECT DISTINCT
JsonMap["alias"] AS Subscription
FROM @subscriptionsQuery1 AS t;
/* Join the value of all rows into a single row containing an array of all values */
@subscriptionsQuery1 =
SELECT
ARRAY_AGG<string>(Subscription) AS SubscriptionArray
FROM @subscriptionsQuery1 AS t;
我能够使用 NewtonSoft MultiLevelJsonExtractor 提取器和 this fixed-up JSON file:
使其工作REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
DECLARE @inputFile string = @"\input\yourInputJSON.json";
DECLARE @outputFile string = @"\output\output.csv";
@input =
EXTRACT id string,
largeobjects string,
date string,
timezone string,
lang string,
country string
FROM @inputFile
USING new MultiLevelJsonExtractor("objects", false,
"id",
"largeobjects",
"date",
"timezone",
"admin.lang",
"admin.country"
);
// Convert the JSON column to SQL MAP to multiple rows
@working =
SELECT id,
JsonFunctions.JsonTuple(largeobjects).Values AS largeobject,
date,
timezone,
lang,
country
FROM @input;
// Explode the JSON SQL MAP
@output =
SELECT id,
x.y AS largeobject,
date,
timezone,
lang,
country
FROM @working
CROSS APPLY
EXPLODE(largeobject) AS x(y);
OUTPUT @output
TO @outputFile
USING Outputters.Csv(quoting : false);
我的结果:
我想说这可能比使用自己动手的方法更安全,因为 NewtonSoft 库专门用于操作 JSON 并且已经过试验和测试。