Athena 在 CloudFront 日志上查询 5 分钟间隔内的最大速率
Athena query on CloudFront Logs for max rate during 5 minute interval
我正在尝试查询 CloudFront 日志以根据 IP 地址得出最大请求速率的流量基准。我正在努力将日期和时间列组合成一个时间戳,我可以用它来生成 5 分钟的间隔。我什至结束了这个查询?
SELECT from_iso8601_timestamp(CONCAT(CAST(date as varchar), 'T', time, 'Z')) AS datetime, request_ip, COUNT(request_ip) AS count
FROM cloudfront_logs
WHERE "date"
BETWEEN DATE '2020-10-28'
AND DATE '2020-11-04'
GROUP BY CONCAT(CAST(date as varchar), 'T', time, 'Z'), request_ip
ORDER BY datetime ASC
LIMIT 10;
我正在努力达到可以做这样的事情的地步:
SELECT
FROM_UNIXTIME((UNIX_TIMESTAMP(table.timestamp) DIV 300) * 300) AS Timestamp,
request_ip,
COUNT(request_ip) AS count
FROM cloudfront_logs
WHERE "date"
BETWEEN DATE '2020-10-28'
AND DATE '2020-11-04'
GROUP BY CONCAT(CAST(date as varchar), 'T', time, 'Z'), request_ip, count
ORDER BY datetime ASC
LIMIT 10;
根据请求,这里是基于日志格式的 cloudfront_logs table 的格式。我试图在 table 创建期间创建串联时间戳,但我有很多语法错误:
CREATE EXTERNAL TABLE IF NOT EXISTS default.cloudfront_logs (
`date` DATE,
time STRING,
location STRING,
bytes BIGINT,
request_ip STRING,
method STRING,
host STRING,
uri STRING,
status INT,
referrer STRING,
user_agent STRING,
query_string STRING,
cookie STRING,
result_type STRING,
request_id STRING,
host_header STRING,
request_protocol STRING,
request_bytes BIGINT,
time_taken FLOAT,
xforwarded_for STRING,
ssl_protocol STRING,
ssl_cipher STRING,
response_result_type STRING,
http_version STRING,
fle_status STRING,
fle_encrypted_fields INT,
c_port INT,
time_to_first_byte FLOAT,
x_edge_detailed_result_type STRING,
sc_content_type STRING,
sc_content_len BIGINT,
sc_range_start BIGINT,
sc_range_end BIGINT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://CloudFront_bucket_name/CloudFront/'
TBLPROPERTIES ( 'skip.header.line.count'='2' )
摘自 Querying Amazon CloudFront Logs
上的官方 AWS 文档
失败示例:
CREATE EXTERNAL TABLE IF NOT EXISTS default.cloudfront_logs (
timestamp AS from_iso8601_timestamp(CONCAT(CAST(date as varchar), 'T', time, 'Z')) DATE,
`date` DATE,
time STRING,
location STRING,
bytes BIGINT,
语法错误:
line 1:8: no viable alternative at input 'create external' (service: amazonathena; status code: 400; error code: invalidrequestexception; request id: e5147782-2e62-465e...)
更新:
这是实际日志中的示例:
#Version: 1.0
#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields c-port time-to-first-byte x-edge-detailed-result-type sc-content-type sc-content-len sc-range-start sc-range-end
2020-10-29 04:59:03 MEL50-C1 1524 192.168.0.2 GET example.cloudfront.net /mobile/js/category/category.js 200 https://www.example.com/ Mozilla/5.0%20(Linux;%20Android%2010;%20A7)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/192.168.0.1%20Mobile%20Safari/537.36 v=2020102601 cookie=redacted Hit BSVYL08_EXAMPLE== cdn.example.com https 2001 0.005 - TLSv1.3 TLS_AES_128_GCM_SHA256 Hit HTTP/2.0
- - 26906 0.004 Hit application/javascript 787 - -
解决方案:
SELECT from_unixtime(floor(to_unixtime(from_iso8601_timestamp(CONCAT(CAST(date AS VARCHAR), 'T', time, 'Z'))) / 300) * 300), request_ip, COUNT(request_ip) AS count
FROM cloudfront_logs
WHERE "date"
BETWEEN DATE '2020-10-28'
AND DATE '2020-11-04'
GROUP BY from_unixtime(floor(to_unixtime(from_iso8601_timestamp(CONCAT(CAST(date AS VARCHAR), 'T', time, 'Z'))) / 300) * 300), request_ip
ORDER BY count DESC;
也许有更短的表达方式,但您应该可以使用以下表达式将日期 + 时间四舍五入为 5 分钟间隔:
from_unixtime(floor(to_unixtime(from_iso8601_timestamp(CONCAT(CAST(date AS VARCHAR), 'T', time, 'Z'))) / 300) * 300)
如果你在第一次查询中使用它,你应该得到你想要的。
我正在尝试查询 CloudFront 日志以根据 IP 地址得出最大请求速率的流量基准。我正在努力将日期和时间列组合成一个时间戳,我可以用它来生成 5 分钟的间隔。我什至结束了这个查询?
SELECT from_iso8601_timestamp(CONCAT(CAST(date as varchar), 'T', time, 'Z')) AS datetime, request_ip, COUNT(request_ip) AS count
FROM cloudfront_logs
WHERE "date"
BETWEEN DATE '2020-10-28'
AND DATE '2020-11-04'
GROUP BY CONCAT(CAST(date as varchar), 'T', time, 'Z'), request_ip
ORDER BY datetime ASC
LIMIT 10;
我正在努力达到可以做这样的事情的地步:
SELECT
FROM_UNIXTIME((UNIX_TIMESTAMP(table.timestamp) DIV 300) * 300) AS Timestamp,
request_ip,
COUNT(request_ip) AS count
FROM cloudfront_logs
WHERE "date"
BETWEEN DATE '2020-10-28'
AND DATE '2020-11-04'
GROUP BY CONCAT(CAST(date as varchar), 'T', time, 'Z'), request_ip, count
ORDER BY datetime ASC
LIMIT 10;
根据请求,这里是基于日志格式的 cloudfront_logs table 的格式。我试图在 table 创建期间创建串联时间戳,但我有很多语法错误:
CREATE EXTERNAL TABLE IF NOT EXISTS default.cloudfront_logs (
`date` DATE,
time STRING,
location STRING,
bytes BIGINT,
request_ip STRING,
method STRING,
host STRING,
uri STRING,
status INT,
referrer STRING,
user_agent STRING,
query_string STRING,
cookie STRING,
result_type STRING,
request_id STRING,
host_header STRING,
request_protocol STRING,
request_bytes BIGINT,
time_taken FLOAT,
xforwarded_for STRING,
ssl_protocol STRING,
ssl_cipher STRING,
response_result_type STRING,
http_version STRING,
fle_status STRING,
fle_encrypted_fields INT,
c_port INT,
time_to_first_byte FLOAT,
x_edge_detailed_result_type STRING,
sc_content_type STRING,
sc_content_len BIGINT,
sc_range_start BIGINT,
sc_range_end BIGINT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://CloudFront_bucket_name/CloudFront/'
TBLPROPERTIES ( 'skip.header.line.count'='2' )
摘自 Querying Amazon CloudFront Logs
上的官方 AWS 文档失败示例:
CREATE EXTERNAL TABLE IF NOT EXISTS default.cloudfront_logs (
timestamp AS from_iso8601_timestamp(CONCAT(CAST(date as varchar), 'T', time, 'Z')) DATE,
`date` DATE,
time STRING,
location STRING,
bytes BIGINT,
语法错误:
line 1:8: no viable alternative at input 'create external' (service: amazonathena; status code: 400; error code: invalidrequestexception; request id: e5147782-2e62-465e...)
更新:
这是实际日志中的示例:
#Version: 1.0
#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields c-port time-to-first-byte x-edge-detailed-result-type sc-content-type sc-content-len sc-range-start sc-range-end
2020-10-29 04:59:03 MEL50-C1 1524 192.168.0.2 GET example.cloudfront.net /mobile/js/category/category.js 200 https://www.example.com/ Mozilla/5.0%20(Linux;%20Android%2010;%20A7)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/192.168.0.1%20Mobile%20Safari/537.36 v=2020102601 cookie=redacted Hit BSVYL08_EXAMPLE== cdn.example.com https 2001 0.005 - TLSv1.3 TLS_AES_128_GCM_SHA256 Hit HTTP/2.0
- - 26906 0.004 Hit application/javascript 787 - -
解决方案:
SELECT from_unixtime(floor(to_unixtime(from_iso8601_timestamp(CONCAT(CAST(date AS VARCHAR), 'T', time, 'Z'))) / 300) * 300), request_ip, COUNT(request_ip) AS count
FROM cloudfront_logs
WHERE "date"
BETWEEN DATE '2020-10-28'
AND DATE '2020-11-04'
GROUP BY from_unixtime(floor(to_unixtime(from_iso8601_timestamp(CONCAT(CAST(date AS VARCHAR), 'T', time, 'Z'))) / 300) * 300), request_ip
ORDER BY count DESC;
也许有更短的表达方式,但您应该可以使用以下表达式将日期 + 时间四舍五入为 5 分钟间隔:
from_unixtime(floor(to_unixtime(from_iso8601_timestamp(CONCAT(CAST(date AS VARCHAR), 'T', time, 'Z'))) / 300) * 300)
如果你在第一次查询中使用它,你应该得到你想要的。