AWS 时间流中的聚合计数错误导致错误
Aggregating Counts in AWS Timestream Error Causes Errors
我将遥测数据推送到 AWS 时间流:
measure_value::varchar
IP
time
measure_name
test.html
192.168.1.100
2021-05-25 14:27:45
hits
blah.html
192.168.1.101
2021-05-25 14:27:45
hits
test.html
192.168.1.102
2021-05-25 14:27:46
hits
我希望在时间流中显示数据的聚合,显示每小时每个 uri 的点击次数。
measure_value::varchar
Count
time
test.html
2
2021-05-25 14:00
blah.html
1
2021-05-25 14:00
我正在尝试使用:
SELECT measure_value::varchar as URIs, CREATE_TIME_SERIES(time, measure_value::varchar) AS served FROM $__database.$__table WHERE $__timeFilter group by measure_value::varchar
但我收到错误消息:
ValidationException: Duplicate timestamps are not allowed in a timeseries.
是我用错了函数还是我的数据有误?
===================
尝试@berto99 的解决方案...我得到:
SELECT measure_value::varchar AS URIs, date_trunc('hour', time) AS hour, count(measure_value::varchar) as queries
FROM $__database.$__table
WHERE $__timeFilter
GROUP BY measure_value::varchar, date_trunc('hour', time)
=====================
更新#2:
到达那里,仍然不是 100% 那里。
SELECT measure_value::varchar AS URIs, bin(time, 15m) AS hour, count(measure_value::varchar) as queries
FROM $__database.$__table
WHERE $__timeFilter
GROUP BY measure_value::varchar, bin(time, 15m) order by hour
我知道可能有更好的方法来做到这一点,但像这样:
SELECT measure_value::varchar AS URIs, date_trunc('hour', time) AS time
FROM $__database.$__table
WHERE $__timeFilter
GROUP BY measure_value::varchar, date_trunc('hour', time)
ORDER BY date_trunc('hour', time)
也许您还需要使用 date_trunc('hour', time at time zone '-X')
调整时区,其中 X 是您的时区
结合@Berto99 的建议和对 Whosebug 的更多挖掘,最终得到了这一切 - TimeStream + Grafana: not recognizing series in data.
您必须将 Berto99 的建议放入子查询,然后 运行 通过 CREATE_TIME_SERIES
。最终查询结果为:
WITH binned_query AS (
SELECT measure_value::varchar AS URIs, bin(time, 15m) AS bin_time, count(measure_value::varchar) as queries
FROM $__database.$__table
WHERE $__timeFilter
GROUP BY measure_value::varchar, bin(time, 15m) order by bin_time
)
SELECT URIs, CREATE_TIME_SERIES(bin_time,queries) as Endpoint
FROM binned_query
GROUP BY URIs
已从使用 date_trunc
更改为 bin
,因为它可以让您更灵活地执行 15 分钟的间隔。
漂亮的图:
我将遥测数据推送到 AWS 时间流:
measure_value::varchar | IP | time | measure_name |
---|---|---|---|
test.html | 192.168.1.100 | 2021-05-25 14:27:45 | hits |
blah.html | 192.168.1.101 | 2021-05-25 14:27:45 | hits |
test.html | 192.168.1.102 | 2021-05-25 14:27:46 | hits |
我希望在时间流中显示数据的聚合,显示每小时每个 uri 的点击次数。
measure_value::varchar | Count | time |
---|---|---|
test.html | 2 | 2021-05-25 14:00 |
blah.html | 1 | 2021-05-25 14:00 |
我正在尝试使用:
SELECT measure_value::varchar as URIs, CREATE_TIME_SERIES(time, measure_value::varchar) AS served FROM $__database.$__table WHERE $__timeFilter group by measure_value::varchar
但我收到错误消息:
ValidationException: Duplicate timestamps are not allowed in a timeseries.
是我用错了函数还是我的数据有误?
===================
尝试@berto99 的解决方案...我得到:
SELECT measure_value::varchar AS URIs, date_trunc('hour', time) AS hour, count(measure_value::varchar) as queries
FROM $__database.$__table
WHERE $__timeFilter
GROUP BY measure_value::varchar, date_trunc('hour', time)
=====================
更新#2:
SELECT measure_value::varchar AS URIs, bin(time, 15m) AS hour, count(measure_value::varchar) as queries
FROM $__database.$__table
WHERE $__timeFilter
GROUP BY measure_value::varchar, bin(time, 15m) order by hour
我知道可能有更好的方法来做到这一点,但像这样:
SELECT measure_value::varchar AS URIs, date_trunc('hour', time) AS time
FROM $__database.$__table
WHERE $__timeFilter
GROUP BY measure_value::varchar, date_trunc('hour', time)
ORDER BY date_trunc('hour', time)
也许您还需要使用 date_trunc('hour', time at time zone '-X')
调整时区,其中 X 是您的时区
结合@Berto99 的建议和对 Whosebug 的更多挖掘,最终得到了这一切 - TimeStream + Grafana: not recognizing series in data.
您必须将 Berto99 的建议放入子查询,然后 运行 通过 CREATE_TIME_SERIES
。最终查询结果为:
WITH binned_query AS (
SELECT measure_value::varchar AS URIs, bin(time, 15m) AS bin_time, count(measure_value::varchar) as queries
FROM $__database.$__table
WHERE $__timeFilter
GROUP BY measure_value::varchar, bin(time, 15m) order by bin_time
)
SELECT URIs, CREATE_TIME_SERIES(bin_time,queries) as Endpoint
FROM binned_query
GROUP BY URIs
已从使用 date_trunc
更改为 bin
,因为它可以让您更灵活地执行 15 分钟的间隔。
漂亮的图: