按时间戳和类别拆分大数据 table Kusto
Unstack a big data table Kusto by timestamp and category
我正在 ADX 上处理一个大数据集,我需要在其中拆分数据行并将它们转换为列。数据中的uniqueID由组、时间戳和名称三个字段组合而成。
datatable(Group:string, timestamp:datetime, channel_name:string , value:long)
[
"A", datetime(2019-05-01 00:00:01), "channel_1", 12,
"A", datetime(2019-05-01 00:00:02), "channel_1", 14,
"A", datetime(2019-05-01 00:00:03), "channel_1", 16,
"A", datetime(2019-05-01 00:00:01), "channel_2", 12,
"A", datetime(2019-05-01 00:00:02), "channel_2", 14,
"A", datetime(2019-05-01 00:00:01), "channel_3", 16,
"B", datetime(2019-04-01 00:00:01), "channel_1", 3,
"B", datetime(2019-04-01 00:00:04), "channel_1", 5,
"B", datetime(2019-04-01 00:00:07), "channel_2", 1,
"B", datetime(2019-04-01 00:00:10), "channel_3", 8,
]
一组的预期输出(结果可以有或没有组列——因为将始终应用组过滤器)
group, timestamp, channel_1, channel_2, channel_3
"A", datetime(2019-05-01 00:00:01), 12, 12, NULL,
"A", datetime(2019-05-01 00:00:02), 14,14, NULL,
"A", datetime(2019-05-01 00:00:03), 16, NULL,NULL,
尝试了 运行 以下查询 (based on),但这并没有按预期取消堆叠列。这返回了与上面相同格式的数据。
| where timestamp > datetime(2019-04-01) and timestamp <datetime(2019-04-03) \ filter1 Always applied
| where machine_name =='A' \filter2 Always Applied
| where channel_name in ("channel1, channel2, channel3")
| summarize value=sum(value) by channel_name, timestamp
你可以试试这个:
datatable(group:string, timestamp:datetime, channel_name:string , value:long)
[
"A", datetime(2019-05-01 00:00:01), "channel_1", 12,
"A", datetime(2019-05-01 00:00:02), "channel_1", 14,
"A", datetime(2019-05-01 00:00:03), "channel_1", 16,
"A", datetime(2019-05-01 00:00:01), "channel_2", 12,
"A", datetime(2019-05-01 00:00:02), "channel_2", 14,
"A", datetime(2019-05-01 00:00:01), "channel_3", 16,
"B", datetime(2019-04-01 00:00:01), "channel_1", 3,
"B", datetime(2019-04-01 00:00:04), "channel_1", 5,
"B", datetime(2019-04-01 00:00:07), "channel_2", 1,
"B", datetime(2019-04-01 00:00:10), "channel_3", 8,
]
| where group == "A"
| summarize b = make_bag(pack(channel_name, value)) by timestamp
| project timestamp, channel_1 = tolong(b.channel_1), channel_2 = tolong(b.channel_2), channel_3 = tolong(b.channel_3)
或者这个(由于使用 bag_unpack()
效率较低):
datatable(group:string, timestamp:datetime, channel_name:string , value:long)
[
"A", datetime(2019-05-01 00:00:01), "channel_1", 12,
"A", datetime(2019-05-01 00:00:02), "channel_1", 14,
"A", datetime(2019-05-01 00:00:03), "channel_1", 16,
"A", datetime(2019-05-01 00:00:01), "channel_2", 12,
"A", datetime(2019-05-01 00:00:02), "channel_2", 14,
"A", datetime(2019-05-01 00:00:01), "channel_3", 16,
"B", datetime(2019-04-01 00:00:01), "channel_1", 3,
"B", datetime(2019-04-01 00:00:04), "channel_1", 5,
"B", datetime(2019-04-01 00:00:07), "channel_2", 1,
"B", datetime(2019-04-01 00:00:10), "channel_3", 8,
]
| where group == "A"
| summarize b = make_bag(pack(channel_name, value)) by timestamp
| evaluate bag_unpack(b)
都输出这个 table:
| timestamp | channel_1 | channel_2 | channel_3 |
|-----------------------------|-----------|-----------|-----------|
| 2019-05-01 00:00:01.0000000 | 12 | 12 | 16 |
| 2019-05-01 00:00:02.0000000 | 14 | 14 | |
| 2019-05-01 00:00:03.0000000 | 16 | | |
我正在 ADX 上处理一个大数据集,我需要在其中拆分数据行并将它们转换为列。数据中的uniqueID由组、时间戳和名称三个字段组合而成。
datatable(Group:string, timestamp:datetime, channel_name:string , value:long)
[
"A", datetime(2019-05-01 00:00:01), "channel_1", 12,
"A", datetime(2019-05-01 00:00:02), "channel_1", 14,
"A", datetime(2019-05-01 00:00:03), "channel_1", 16,
"A", datetime(2019-05-01 00:00:01), "channel_2", 12,
"A", datetime(2019-05-01 00:00:02), "channel_2", 14,
"A", datetime(2019-05-01 00:00:01), "channel_3", 16,
"B", datetime(2019-04-01 00:00:01), "channel_1", 3,
"B", datetime(2019-04-01 00:00:04), "channel_1", 5,
"B", datetime(2019-04-01 00:00:07), "channel_2", 1,
"B", datetime(2019-04-01 00:00:10), "channel_3", 8,
]
一组的预期输出(结果可以有或没有组列——因为将始终应用组过滤器)
group, timestamp, channel_1, channel_2, channel_3
"A", datetime(2019-05-01 00:00:01), 12, 12, NULL,
"A", datetime(2019-05-01 00:00:02), 14,14, NULL,
"A", datetime(2019-05-01 00:00:03), 16, NULL,NULL,
尝试了 运行 以下查询 (based on),但这并没有按预期取消堆叠列。这返回了与上面相同格式的数据。
| where timestamp > datetime(2019-04-01) and timestamp <datetime(2019-04-03) \ filter1 Always applied
| where machine_name =='A' \filter2 Always Applied
| where channel_name in ("channel1, channel2, channel3")
| summarize value=sum(value) by channel_name, timestamp
你可以试试这个:
datatable(group:string, timestamp:datetime, channel_name:string , value:long)
[
"A", datetime(2019-05-01 00:00:01), "channel_1", 12,
"A", datetime(2019-05-01 00:00:02), "channel_1", 14,
"A", datetime(2019-05-01 00:00:03), "channel_1", 16,
"A", datetime(2019-05-01 00:00:01), "channel_2", 12,
"A", datetime(2019-05-01 00:00:02), "channel_2", 14,
"A", datetime(2019-05-01 00:00:01), "channel_3", 16,
"B", datetime(2019-04-01 00:00:01), "channel_1", 3,
"B", datetime(2019-04-01 00:00:04), "channel_1", 5,
"B", datetime(2019-04-01 00:00:07), "channel_2", 1,
"B", datetime(2019-04-01 00:00:10), "channel_3", 8,
]
| where group == "A"
| summarize b = make_bag(pack(channel_name, value)) by timestamp
| project timestamp, channel_1 = tolong(b.channel_1), channel_2 = tolong(b.channel_2), channel_3 = tolong(b.channel_3)
或者这个(由于使用 bag_unpack()
效率较低):
datatable(group:string, timestamp:datetime, channel_name:string , value:long)
[
"A", datetime(2019-05-01 00:00:01), "channel_1", 12,
"A", datetime(2019-05-01 00:00:02), "channel_1", 14,
"A", datetime(2019-05-01 00:00:03), "channel_1", 16,
"A", datetime(2019-05-01 00:00:01), "channel_2", 12,
"A", datetime(2019-05-01 00:00:02), "channel_2", 14,
"A", datetime(2019-05-01 00:00:01), "channel_3", 16,
"B", datetime(2019-04-01 00:00:01), "channel_1", 3,
"B", datetime(2019-04-01 00:00:04), "channel_1", 5,
"B", datetime(2019-04-01 00:00:07), "channel_2", 1,
"B", datetime(2019-04-01 00:00:10), "channel_3", 8,
]
| where group == "A"
| summarize b = make_bag(pack(channel_name, value)) by timestamp
| evaluate bag_unpack(b)
都输出这个 table:
| timestamp | channel_1 | channel_2 | channel_3 |
|-----------------------------|-----------|-----------|-----------|
| 2019-05-01 00:00:01.0000000 | 12 | 12 | 16 |
| 2019-05-01 00:00:02.0000000 | 14 | 14 | |
| 2019-05-01 00:00:03.0000000 | 16 | | |