使用流分析从 Application Insights 将自定义事件维度导出到 SQL

Export Custom Event Dimensions to SQL from Application Insights using Stream Analytics

我正在按照示例演练 Export to SQL from Application Insights using Stream Analytics 进行操作。我正在尝试导出自定义事件维度(下面 JSON 示例中的 context.custom.dimensions),这些维度作为嵌套的 JSON 数组添加到数据文件中。如何展平 context.custom.dimensions 处的维度数组以导出到 SQL?

JSON...

{
  "event": [
    {
      "name": "50_DistanceSelect",
      "count": 1
    }
  ],
  "internal": {
    "data": {
      "id": "aad2627b-60c5-48e8-aa35-197cae30a0cf",
      "documentVersion": "1.5"
    }
  },
  "context": {
    "device": {
      "os": "Windows",
      "osVersion": "Windows 8.1",
      "type": "PC",
      "browser": "Chrome",
      "browserVersion": "Chrome 43.0",
      "screenResolution": {
        "value": "1920X1080"
      },
      "locale": "unknown",
      "id": "browser",
      "userAgent": "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
    },
    "application": {},
    "location": {
      "continent": "North America",
      "country": "United States",
      "point": {
        "lat": 38.0,
        "lon": -97.0
      },
      "clientip": "0.115.6.185",
      "province": "",
      "city": ""
    },
    "data": {
      "isSynthetic": false,
      "eventTime": "2015-07-15T23:43:27.595Z",
      "samplingRate": 0.0
    },
    "operation": {
      "id": "2474EE6F-5F6F-48C3-BA43-51636928075A"
    },
    "user": {
      "anonId": "BA05C4BE-1C42-482F-9836-D79008E78A9D",
      "anonAcquisitionDate": "0001-01-01T00:00:00Z",
      "authAcquisitionDate": "0001-01-01T00:00:00Z",
      "accountAcquisitionDate": "0001-01-01T00:00:00Z"
    },
    "custom": {
      "dimensions": [
        {
          "CategoryAction": "click"
        },
        {
          "SessionId": "73ef454d-fa39-4125-b4d0-44486933533b"
        },
        {
          "WebsiteVersion": "3.0"
        },
        {
          "PageSection": "FilterFind"
        },
        {
          "Category": "EventCategory1"
        },
        {
          "Page": "/page-in-question"
        }
      ],
      "metrics": []
    },
    "session": {
      "id": "062703E5-5E15-491A-AC75-2FE54EF03623",
      "isFirst": false
    }
  }
}

您在 SQL 中有什么架构?您是否想要 SQL 中的一行包含所有 维度作为列?

今天这可能是不可能的。但是,7 月 30 日之后,Azure 流分析中将有更多 Array/Record 功能。

然后你就可以做这样的事情了:

SELECT 
    CASE 
        WHEN GetArrayLength(A.context.custom.dimensions) > 0
            THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 0), 'CategoryAction')
        ELSE ''
        END AS CategoryAction 
    CASE 
        WHEN GetArrayLength(A.context.custom.dimensions) > 1
            THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 1), 'WebsiteVersion')
        ELSE ''
        END AS WebsiteVersion 
    CASE 
        WHEN GetArrayLength(A.context.custom.dimensions) > 2
            THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 2), 'PageSection')
        ELSE ''
        END AS PageSection
FROM input

如果您希望每个维度都有单独的行,那么您可以使用 CROSS APPLY 运算符。

一个稍微更动态的解决方案是设置一个温度 table:

WITH ATable AS (
SELECT
     temp.internal.data.id as ID
    ,dimensions.ArrayValue.CategoryAction as CategoryAction
    ,dimensions.ArrayValue.SessionId as SessionId 
    ,dimensions.ArrayValue.WebsiteVersion as WebsiteVersion 
    ,dimensions.ArrayValue.PageSection as PageSection 
    ,dimensions.ArrayValue.Category as Category 
    ,dimensions.ArrayValue.Page as Page  
FROM [analyticseventinputs] temp 
CROSS APPLY GetElements(temp.[context].[custom].[dimensions]) as dimensions)

然后根据唯一键进行连接

FROM [analyticseventinputs] Input 
Left JOIN ATable CategoryAction on 
    Input.internal.data.id = CategoryAction.ID AND
    CategoryAction.CategoryAction <> "" AND
     DATEDIFF(day, Input, CategoryAction) BETWEEN 0 AND 5 

相当烦人的一点是对 datediff 的要求,因为连接旨在合并 2 个数据流,但在这种情况下,您只是在唯一键上连接。所以我将它设置为 5 天的较大值。与其他解决方案相比,这实际上只能防止自定义参数未按顺序出现。

大多数在线教程使用 CROSS APPLY 或 OUTER APPLY 但这不是您要查找的内容,因为它会将每个 属性 放在不同的行中。要克服这个问题,请使用函数:GetRecordPropertyValue 和 GetArrayElement,如下所示。这会将属性展平成一行。

SELECT
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 0), 'CategoryAction') AS CategoryAction,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 1), 'SessionId') AS SessionId,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 2), 'WebsiteVersion') AS WebsiteVersion,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 3), 'PageSection') AS PageSection,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 4), 'Category') AS Category,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 5), 'Page') AS Page
INTO
  [outputstream]
FROM
  [inputstream] MySource

Alex Raizman 提出的一种非常方便的方法是对要展平的字段进行一些聚合,按剩余的 select 列表分组,假设

  • 您知道维度中可能对象的集合,并且
  • 您在此数组中没有重复的对象,并且
  • 有些东西可以唯一标识您的初始行(例如 id )

    SELECT
      CategoryAction= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'CategoryAction') AS
      NVARCHAR(MAX))),
      SessionId= min(CAST(GetRecordPropertyValue(d.arrayvalue, 'SessionId') AS
      NVARCHAR(MAX))),
      WebsiteVersion= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'WebsiteVersion') AS
      NVARCHAR(MAX))),
      PageSection= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'PageSection') AS
      NVARCHAR(MAX))),
      Category= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'Category') AS
      NVARCHAR(MAX))),    
      Page= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'Page') AS NVARCHAR(MAX))) 
    INTO  
      [outputstream] 
    FROM [inputstream] MySource 
    CROSS APPLY GetArrayElements(MySource.[context].[custom].[dimensions]) d 
    GROUP BY System.Timestamp, MySource.id
    

我们还按 System.Timestamp 分组以创建一个时间 window,正如流分析所期望的那样,以执行基于集合的操作,如计数或聚合。

虽然问题是旧的。但这就是自定义维度单行的实现方式。随着自定义维度数量的增加,它会变得丑陋。

    SELECT
    A.internal.data.id,        
    eventFlat.ArrayValue.name as eventName,
    A.context.operation.name as operation,
    A.context.data.eventTime,
    a1.company,
    a2.userId,
    a3.feature,        
    A.context.device,    
    A.context.location         
FROM [YourInputAlias] A   
OUTER APPLY GetArrayElements(A.event) eventFlat  
LEFT JOIN (
        SELECT 
        A1.internal.data.id as id,   
        customDimensionsFlat.ArrayValue.company
      FROM [YourInputAlias] A1  
      OUTER APPLY GetArrayElements(A1.context.custom.dimensions) customDimensionsFlat   
      where  customDimensionsFlat.ArrayValue.company IS NOT NULL
      ) a1 ON a.internal.data.id = a1.id AND datediff(day, a, a1) between 0 and 5
LEFT JOIN (
        SELECT 
        A2.internal.data.id as id,   
        customDimensionsFlat.ArrayValue.userid     
      FROM [YourInputAlias] A2  
      OUTER APPLY GetArrayElements(A2.context.custom.dimensions) customDimensionsFlat    
      where  customDimensionsFlat.ArrayValue.userid  IS NOT NULL
      ) a2 ON a.internal.data.id = a2.id AND datediff(day, a, a2) between 0 and 5
LEFT JOIN (
        SELECT 
        A3.internal.data.id as id,   
        customDimensionsFlat.ArrayValue.feature     
      FROM [YourInputAlias] A3
      OUTER APPLY GetArrayElements(A3.context.custom.dimensions) customDimensionsFlat    
      where  customDimensionsFlat.ArrayValue.feature  IS NOT NULL
      ) a3 ON a.internal.data.id = a3.id AND datediff(day, a, a3) between 0 and 5