使用流分析从 Application Insights 将自定义事件维度导出到 SQL
Export Custom Event Dimensions to SQL from Application Insights using Stream Analytics
我正在按照示例演练 Export to SQL from Application Insights using Stream Analytics 进行操作。我正在尝试导出自定义事件维度(下面 JSON 示例中的 context.custom.dimensions),这些维度作为嵌套的 JSON 数组添加到数据文件中。如何展平 context.custom.dimensions 处的维度数组以导出到 SQL?
JSON...
{
"event": [
{
"name": "50_DistanceSelect",
"count": 1
}
],
"internal": {
"data": {
"id": "aad2627b-60c5-48e8-aa35-197cae30a0cf",
"documentVersion": "1.5"
}
},
"context": {
"device": {
"os": "Windows",
"osVersion": "Windows 8.1",
"type": "PC",
"browser": "Chrome",
"browserVersion": "Chrome 43.0",
"screenResolution": {
"value": "1920X1080"
},
"locale": "unknown",
"id": "browser",
"userAgent": "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
},
"application": {},
"location": {
"continent": "North America",
"country": "United States",
"point": {
"lat": 38.0,
"lon": -97.0
},
"clientip": "0.115.6.185",
"province": "",
"city": ""
},
"data": {
"isSynthetic": false,
"eventTime": "2015-07-15T23:43:27.595Z",
"samplingRate": 0.0
},
"operation": {
"id": "2474EE6F-5F6F-48C3-BA43-51636928075A"
},
"user": {
"anonId": "BA05C4BE-1C42-482F-9836-D79008E78A9D",
"anonAcquisitionDate": "0001-01-01T00:00:00Z",
"authAcquisitionDate": "0001-01-01T00:00:00Z",
"accountAcquisitionDate": "0001-01-01T00:00:00Z"
},
"custom": {
"dimensions": [
{
"CategoryAction": "click"
},
{
"SessionId": "73ef454d-fa39-4125-b4d0-44486933533b"
},
{
"WebsiteVersion": "3.0"
},
{
"PageSection": "FilterFind"
},
{
"Category": "EventCategory1"
},
{
"Page": "/page-in-question"
}
],
"metrics": []
},
"session": {
"id": "062703E5-5E15-491A-AC75-2FE54EF03623",
"isFirst": false
}
}
}
您在 SQL 中有什么架构?您是否想要 SQL 中的一行包含所有
维度作为列?
今天这可能是不可能的。但是,7 月 30 日之后,Azure 流分析中将有更多 Array/Record 功能。
然后你就可以做这样的事情了:
SELECT
CASE
WHEN GetArrayLength(A.context.custom.dimensions) > 0
THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 0), 'CategoryAction')
ELSE ''
END AS CategoryAction
CASE
WHEN GetArrayLength(A.context.custom.dimensions) > 1
THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 1), 'WebsiteVersion')
ELSE ''
END AS WebsiteVersion
CASE
WHEN GetArrayLength(A.context.custom.dimensions) > 2
THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 2), 'PageSection')
ELSE ''
END AS PageSection
FROM input
如果您希望每个维度都有单独的行,那么您可以使用 CROSS APPLY 运算符。
一个稍微更动态的解决方案是设置一个温度 table:
WITH ATable AS (
SELECT
temp.internal.data.id as ID
,dimensions.ArrayValue.CategoryAction as CategoryAction
,dimensions.ArrayValue.SessionId as SessionId
,dimensions.ArrayValue.WebsiteVersion as WebsiteVersion
,dimensions.ArrayValue.PageSection as PageSection
,dimensions.ArrayValue.Category as Category
,dimensions.ArrayValue.Page as Page
FROM [analyticseventinputs] temp
CROSS APPLY GetElements(temp.[context].[custom].[dimensions]) as dimensions)
然后根据唯一键进行连接
FROM [analyticseventinputs] Input
Left JOIN ATable CategoryAction on
Input.internal.data.id = CategoryAction.ID AND
CategoryAction.CategoryAction <> "" AND
DATEDIFF(day, Input, CategoryAction) BETWEEN 0 AND 5
相当烦人的一点是对 datediff 的要求,因为连接旨在合并 2 个数据流,但在这种情况下,您只是在唯一键上连接。所以我将它设置为 5 天的较大值。与其他解决方案相比,这实际上只能防止自定义参数未按顺序出现。
大多数在线教程使用 CROSS APPLY 或 OUTER APPLY 但这不是您要查找的内容,因为它会将每个 属性 放在不同的行中。要克服这个问题,请使用函数:GetRecordPropertyValue 和 GetArrayElement,如下所示。这会将属性展平成一行。
SELECT
GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 0), 'CategoryAction') AS CategoryAction,
GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 1), 'SessionId') AS SessionId,
GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 2), 'WebsiteVersion') AS WebsiteVersion,
GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 3), 'PageSection') AS PageSection,
GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 4), 'Category') AS Category,
GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 5), 'Page') AS Page
INTO
[outputstream]
FROM
[inputstream] MySource
Alex Raizman 提出的一种非常方便的方法是对要展平的字段进行一些聚合,按剩余的 select 列表分组,假设
- 您知道维度中可能对象的集合,并且
- 您在此数组中没有重复的对象,并且
有些东西可以唯一标识您的初始行(例如 id )
SELECT
CategoryAction= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'CategoryAction') AS
NVARCHAR(MAX))),
SessionId= min(CAST(GetRecordPropertyValue(d.arrayvalue, 'SessionId') AS
NVARCHAR(MAX))),
WebsiteVersion= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'WebsiteVersion') AS
NVARCHAR(MAX))),
PageSection= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'PageSection') AS
NVARCHAR(MAX))),
Category= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'Category') AS
NVARCHAR(MAX))),
Page= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'Page') AS NVARCHAR(MAX)))
INTO
[outputstream]
FROM [inputstream] MySource
CROSS APPLY GetArrayElements(MySource.[context].[custom].[dimensions]) d
GROUP BY System.Timestamp, MySource.id
我们还按 System.Timestamp
分组以创建一个时间 window,正如流分析所期望的那样,以执行基于集合的操作,如计数或聚合。
虽然问题是旧的。但这就是自定义维度单行的实现方式。随着自定义维度数量的增加,它会变得丑陋。
SELECT
A.internal.data.id,
eventFlat.ArrayValue.name as eventName,
A.context.operation.name as operation,
A.context.data.eventTime,
a1.company,
a2.userId,
a3.feature,
A.context.device,
A.context.location
FROM [YourInputAlias] A
OUTER APPLY GetArrayElements(A.event) eventFlat
LEFT JOIN (
SELECT
A1.internal.data.id as id,
customDimensionsFlat.ArrayValue.company
FROM [YourInputAlias] A1
OUTER APPLY GetArrayElements(A1.context.custom.dimensions) customDimensionsFlat
where customDimensionsFlat.ArrayValue.company IS NOT NULL
) a1 ON a.internal.data.id = a1.id AND datediff(day, a, a1) between 0 and 5
LEFT JOIN (
SELECT
A2.internal.data.id as id,
customDimensionsFlat.ArrayValue.userid
FROM [YourInputAlias] A2
OUTER APPLY GetArrayElements(A2.context.custom.dimensions) customDimensionsFlat
where customDimensionsFlat.ArrayValue.userid IS NOT NULL
) a2 ON a.internal.data.id = a2.id AND datediff(day, a, a2) between 0 and 5
LEFT JOIN (
SELECT
A3.internal.data.id as id,
customDimensionsFlat.ArrayValue.feature
FROM [YourInputAlias] A3
OUTER APPLY GetArrayElements(A3.context.custom.dimensions) customDimensionsFlat
where customDimensionsFlat.ArrayValue.feature IS NOT NULL
) a3 ON a.internal.data.id = a3.id AND datediff(day, a, a3) between 0 and 5
我正在按照示例演练 Export to SQL from Application Insights using Stream Analytics 进行操作。我正在尝试导出自定义事件维度(下面 JSON 示例中的 context.custom.dimensions),这些维度作为嵌套的 JSON 数组添加到数据文件中。如何展平 context.custom.dimensions 处的维度数组以导出到 SQL?
JSON...
{
"event": [
{
"name": "50_DistanceSelect",
"count": 1
}
],
"internal": {
"data": {
"id": "aad2627b-60c5-48e8-aa35-197cae30a0cf",
"documentVersion": "1.5"
}
},
"context": {
"device": {
"os": "Windows",
"osVersion": "Windows 8.1",
"type": "PC",
"browser": "Chrome",
"browserVersion": "Chrome 43.0",
"screenResolution": {
"value": "1920X1080"
},
"locale": "unknown",
"id": "browser",
"userAgent": "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
},
"application": {},
"location": {
"continent": "North America",
"country": "United States",
"point": {
"lat": 38.0,
"lon": -97.0
},
"clientip": "0.115.6.185",
"province": "",
"city": ""
},
"data": {
"isSynthetic": false,
"eventTime": "2015-07-15T23:43:27.595Z",
"samplingRate": 0.0
},
"operation": {
"id": "2474EE6F-5F6F-48C3-BA43-51636928075A"
},
"user": {
"anonId": "BA05C4BE-1C42-482F-9836-D79008E78A9D",
"anonAcquisitionDate": "0001-01-01T00:00:00Z",
"authAcquisitionDate": "0001-01-01T00:00:00Z",
"accountAcquisitionDate": "0001-01-01T00:00:00Z"
},
"custom": {
"dimensions": [
{
"CategoryAction": "click"
},
{
"SessionId": "73ef454d-fa39-4125-b4d0-44486933533b"
},
{
"WebsiteVersion": "3.0"
},
{
"PageSection": "FilterFind"
},
{
"Category": "EventCategory1"
},
{
"Page": "/page-in-question"
}
],
"metrics": []
},
"session": {
"id": "062703E5-5E15-491A-AC75-2FE54EF03623",
"isFirst": false
}
}
}
您在 SQL 中有什么架构?您是否想要 SQL 中的一行包含所有 维度作为列?
今天这可能是不可能的。但是,7 月 30 日之后,Azure 流分析中将有更多 Array/Record 功能。
然后你就可以做这样的事情了:
SELECT
CASE
WHEN GetArrayLength(A.context.custom.dimensions) > 0
THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 0), 'CategoryAction')
ELSE ''
END AS CategoryAction
CASE
WHEN GetArrayLength(A.context.custom.dimensions) > 1
THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 1), 'WebsiteVersion')
ELSE ''
END AS WebsiteVersion
CASE
WHEN GetArrayLength(A.context.custom.dimensions) > 2
THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 2), 'PageSection')
ELSE ''
END AS PageSection
FROM input
如果您希望每个维度都有单独的行,那么您可以使用 CROSS APPLY 运算符。
一个稍微更动态的解决方案是设置一个温度 table:
WITH ATable AS (
SELECT
temp.internal.data.id as ID
,dimensions.ArrayValue.CategoryAction as CategoryAction
,dimensions.ArrayValue.SessionId as SessionId
,dimensions.ArrayValue.WebsiteVersion as WebsiteVersion
,dimensions.ArrayValue.PageSection as PageSection
,dimensions.ArrayValue.Category as Category
,dimensions.ArrayValue.Page as Page
FROM [analyticseventinputs] temp
CROSS APPLY GetElements(temp.[context].[custom].[dimensions]) as dimensions)
然后根据唯一键进行连接
FROM [analyticseventinputs] Input
Left JOIN ATable CategoryAction on
Input.internal.data.id = CategoryAction.ID AND
CategoryAction.CategoryAction <> "" AND
DATEDIFF(day, Input, CategoryAction) BETWEEN 0 AND 5
相当烦人的一点是对 datediff 的要求,因为连接旨在合并 2 个数据流,但在这种情况下,您只是在唯一键上连接。所以我将它设置为 5 天的较大值。与其他解决方案相比,这实际上只能防止自定义参数未按顺序出现。
大多数在线教程使用 CROSS APPLY 或 OUTER APPLY 但这不是您要查找的内容,因为它会将每个 属性 放在不同的行中。要克服这个问题,请使用函数:GetRecordPropertyValue 和 GetArrayElement,如下所示。这会将属性展平成一行。
SELECT
GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 0), 'CategoryAction') AS CategoryAction,
GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 1), 'SessionId') AS SessionId,
GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 2), 'WebsiteVersion') AS WebsiteVersion,
GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 3), 'PageSection') AS PageSection,
GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 4), 'Category') AS Category,
GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 5), 'Page') AS Page
INTO
[outputstream]
FROM
[inputstream] MySource
Alex Raizman 提出的一种非常方便的方法是对要展平的字段进行一些聚合,按剩余的 select 列表分组,假设
- 您知道维度中可能对象的集合,并且
- 您在此数组中没有重复的对象,并且
有些东西可以唯一标识您的初始行(例如 id )
SELECT CategoryAction= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'CategoryAction') AS NVARCHAR(MAX))), SessionId= min(CAST(GetRecordPropertyValue(d.arrayvalue, 'SessionId') AS NVARCHAR(MAX))), WebsiteVersion= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'WebsiteVersion') AS NVARCHAR(MAX))), PageSection= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'PageSection') AS NVARCHAR(MAX))), Category= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'Category') AS NVARCHAR(MAX))), Page= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'Page') AS NVARCHAR(MAX))) INTO [outputstream] FROM [inputstream] MySource CROSS APPLY GetArrayElements(MySource.[context].[custom].[dimensions]) d GROUP BY System.Timestamp, MySource.id
我们还按 System.Timestamp
分组以创建一个时间 window,正如流分析所期望的那样,以执行基于集合的操作,如计数或聚合。
虽然问题是旧的。但这就是自定义维度单行的实现方式。随着自定义维度数量的增加,它会变得丑陋。
SELECT
A.internal.data.id,
eventFlat.ArrayValue.name as eventName,
A.context.operation.name as operation,
A.context.data.eventTime,
a1.company,
a2.userId,
a3.feature,
A.context.device,
A.context.location
FROM [YourInputAlias] A
OUTER APPLY GetArrayElements(A.event) eventFlat
LEFT JOIN (
SELECT
A1.internal.data.id as id,
customDimensionsFlat.ArrayValue.company
FROM [YourInputAlias] A1
OUTER APPLY GetArrayElements(A1.context.custom.dimensions) customDimensionsFlat
where customDimensionsFlat.ArrayValue.company IS NOT NULL
) a1 ON a.internal.data.id = a1.id AND datediff(day, a, a1) between 0 and 5
LEFT JOIN (
SELECT
A2.internal.data.id as id,
customDimensionsFlat.ArrayValue.userid
FROM [YourInputAlias] A2
OUTER APPLY GetArrayElements(A2.context.custom.dimensions) customDimensionsFlat
where customDimensionsFlat.ArrayValue.userid IS NOT NULL
) a2 ON a.internal.data.id = a2.id AND datediff(day, a, a2) between 0 and 5
LEFT JOIN (
SELECT
A3.internal.data.id as id,
customDimensionsFlat.ArrayValue.feature
FROM [YourInputAlias] A3
OUTER APPLY GetArrayElements(A3.context.custom.dimensions) customDimensionsFlat
where customDimensionsFlat.ArrayValue.feature IS NOT NULL
) a3 ON a.internal.data.id = a3.id AND datediff(day, a, a3) between 0 and 5