使用数组中的元素加入流分析
Join in Stream Analytics with an element in an Array
我正在尝试使用参考数据加入流分析。
下面是流数据的输入。
[{
"id":"111111101",
"basetime":0,
"xyz":
[
{
"xxx":1,
"yyy":2631,
"aaa":"470A01",
"id":1
},
{
"xxx":0,
"yyy":0,
"aaa":"000000",
"id":61
}
]
},
{
"id":"111111102",
"basetime":0,
"xyz":
[
{
"xxx":1,
"yyy":2631,
"aaa":"03F4EB",
"id":1
}
]
},
{
"id":"111111103",
"basetime":0,
"xyz":
[
{
"xxx":1,
"yyy":2631,
"aaa":"6706",
"id":1
}
]
}
]
以下是参考主数据。
[
{
"aaa": "470A01"
},
{
"aaa": "03F4EB"
},
{
"aaa": "710211"
}
]
编写的SAQL如下图
WITH INPUT1 AS (
SELECT
input.id.dateTime AS ID,
flatArrayElement as ABC,
FROM [signals2] as input
CROSS APPLY GetArrayElements(input.xyz) AS flatArrayElement
)
我使用 CROSS APPLY 将 xyz 中的每个元素作为不同的行。
INPUT1的输出如下图
+----------+------------------------------------------------------------------------+
| ID | ABC |
+----------+------------------------------------------------------------------------+
| 111111101| {"ArrayValue":{"xxx":1,"yyy":2631,"aaa":470A01,"id":1},"ArrayIndex":0} |
| 111111101| {"ArrayValue":{"xxx":0,"yyy":0,"aaa":000000,"id":61},"ArrayIndex":1} |
| 111111102| {"ArrayValue":{"xxx":1,"yyy":2631,"aaa":03F4EB,"id":1},"ArrayIndex":0} |
| 111111103| {"ArrayValue":{"xxx":1,"yyy":2631,"aaa":6706,"id":1},"ArrayIndex":0} |
+-------------------+---------------------------------------------------------------+
现在我正在尝试将数据 xyz.aaa 与参考数据连接起来,如下所示,其中 master 是参考数据。
SIGNALS AS (
SELECT * FROM INPUT1 I JOIN master M ON I.ABC.ArrayValue.aaa = M.aaa
我低于输出,但问题是 xyz 有多个元素在输出中重复。
+-------------------------------+------------------------------------------------------------------------------------------+--------+
| i___timestamp | i | m |
+-------------------------------+------------------------------------------------------------------------------------------+--------+
| "2019-11-13T03:36:22.4636494Z"| "id": "111111101",{"ArrayValue":{"xxx":1,"yyy":2631,"aaa":470A01,"id":1},"ArrayIndex":0} | 470A01 |
| "2019-11-13T03:36:22.4636494Z"| "id": "111111101",{"ArrayValue":{"xxx":1,"yyy":2631,"aaa":470A01,"id":1},"ArrayIndex":0} | 470A01 |
| "2019-11-13T03:36:22.4636494Z"| "id": "111111102",{"ArrayValue":{"xxx":1,"yyy":2631,"aaa":03F4EB,"id":1},"ArrayIndex":0} | 03F4EB |
+-------------------------------+------------------------------------------------------------------------------------------+--------+
我很困惑为什么前两行是重复的,应该只有一个条目。在 xyz 中的两个元素中,一个有效,一个无效。但这里的有效元素重复了两次。
可能是什么原因?如何解决这个问题?
请看我的查询sql:
WITH INPUT1 AS (
SELECT
input.name as name,
flatArrayElement as ABC
FROM
[YourInputAlias] as input
CROSS APPLY GetArrayElements(input.xyz) AS flatArrayElement
)
SELECT INPUT1.ABC.ArrayValue.aaa FROM INPUT1
JOIN jayrefer on INPUT1.ABC.ArrayValue.aaa = jayrefer.item
jayrefer 是您的主参考:
输入是您的输入:
输出:
总结一下,这个问题是由reference data
中的重复原始文件引起的,所以最终结果是重复的。
最后的sql是:
WITH INPUT AS (
SELECT
input1.id.dateTime AS ID,
flatArrayElement as ABC
FROM input1
CROSS APPLY GetArrayElements(input1.xyz) AS flatArrayElement
)
SELECT * FROM INPUT I JOIN jayrefer M ON I.ABC.ArrayValue.aaa = M.aaa
我正在尝试使用参考数据加入流分析。
下面是流数据的输入。
[{
"id":"111111101",
"basetime":0,
"xyz":
[
{
"xxx":1,
"yyy":2631,
"aaa":"470A01",
"id":1
},
{
"xxx":0,
"yyy":0,
"aaa":"000000",
"id":61
}
]
},
{
"id":"111111102",
"basetime":0,
"xyz":
[
{
"xxx":1,
"yyy":2631,
"aaa":"03F4EB",
"id":1
}
]
},
{
"id":"111111103",
"basetime":0,
"xyz":
[
{
"xxx":1,
"yyy":2631,
"aaa":"6706",
"id":1
}
]
}
]
以下是参考主数据。
[
{
"aaa": "470A01"
},
{
"aaa": "03F4EB"
},
{
"aaa": "710211"
}
]
编写的SAQL如下图
WITH INPUT1 AS (
SELECT
input.id.dateTime AS ID,
flatArrayElement as ABC,
FROM [signals2] as input
CROSS APPLY GetArrayElements(input.xyz) AS flatArrayElement
)
我使用 CROSS APPLY 将 xyz 中的每个元素作为不同的行。
INPUT1的输出如下图
+----------+------------------------------------------------------------------------+
| ID | ABC |
+----------+------------------------------------------------------------------------+
| 111111101| {"ArrayValue":{"xxx":1,"yyy":2631,"aaa":470A01,"id":1},"ArrayIndex":0} |
| 111111101| {"ArrayValue":{"xxx":0,"yyy":0,"aaa":000000,"id":61},"ArrayIndex":1} |
| 111111102| {"ArrayValue":{"xxx":1,"yyy":2631,"aaa":03F4EB,"id":1},"ArrayIndex":0} |
| 111111103| {"ArrayValue":{"xxx":1,"yyy":2631,"aaa":6706,"id":1},"ArrayIndex":0} |
+-------------------+---------------------------------------------------------------+
现在我正在尝试将数据 xyz.aaa 与参考数据连接起来,如下所示,其中 master 是参考数据。
SIGNALS AS (
SELECT * FROM INPUT1 I JOIN master M ON I.ABC.ArrayValue.aaa = M.aaa
我低于输出,但问题是 xyz 有多个元素在输出中重复。
+-------------------------------+------------------------------------------------------------------------------------------+--------+
| i___timestamp | i | m |
+-------------------------------+------------------------------------------------------------------------------------------+--------+
| "2019-11-13T03:36:22.4636494Z"| "id": "111111101",{"ArrayValue":{"xxx":1,"yyy":2631,"aaa":470A01,"id":1},"ArrayIndex":0} | 470A01 |
| "2019-11-13T03:36:22.4636494Z"| "id": "111111101",{"ArrayValue":{"xxx":1,"yyy":2631,"aaa":470A01,"id":1},"ArrayIndex":0} | 470A01 |
| "2019-11-13T03:36:22.4636494Z"| "id": "111111102",{"ArrayValue":{"xxx":1,"yyy":2631,"aaa":03F4EB,"id":1},"ArrayIndex":0} | 03F4EB |
+-------------------------------+------------------------------------------------------------------------------------------+--------+
我很困惑为什么前两行是重复的,应该只有一个条目。在 xyz 中的两个元素中,一个有效,一个无效。但这里的有效元素重复了两次。 可能是什么原因?如何解决这个问题?
请看我的查询sql:
WITH INPUT1 AS (
SELECT
input.name as name,
flatArrayElement as ABC
FROM
[YourInputAlias] as input
CROSS APPLY GetArrayElements(input.xyz) AS flatArrayElement
)
SELECT INPUT1.ABC.ArrayValue.aaa FROM INPUT1
JOIN jayrefer on INPUT1.ABC.ArrayValue.aaa = jayrefer.item
jayrefer 是您的主参考:
输入是您的输入:
输出:
总结一下,这个问题是由reference data
中的重复原始文件引起的,所以最终结果是重复的。
最后的sql是:
WITH INPUT AS (
SELECT
input1.id.dateTime AS ID,
flatArrayElement as ABC
FROM input1
CROSS APPLY GetArrayElements(input1.xyz) AS flatArrayElement
)
SELECT * FROM INPUT I JOIN jayrefer M ON I.ABC.ArrayValue.aaa = M.aaa