使用 Athena 从 AWS WAF 日志中的规则组列表中获取终止规则
Using Athena to get terminatingrule from rulegrouplist in AWS WAF logs
我按照 these instructions 将我的 AWS WAF 数据导入 Athena table。
我想查询数据以查找具有 BLOCK 操作的最新请求。此查询有效:
SELECT
from_unixtime(timestamp / 1000e0) AS date,
action,
httprequest.clientip AS ip,
httprequest.uri AS request,
httprequest.country as country,
terminatingruleid,
rulegrouplist
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
LIMIT 100;
我的问题是清楚地识别“终止规则”——请求被阻止的原因。例如,结果有
terminatingrule = AWS-AWSManagedRulesCommonRuleSet
和
rulegrouplist = [
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesAmazonIpReputationList",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesKnownBadInputsRuleSet",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesLinuxRuleSet",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesCommonRuleSet",
"terminatingrule": {
"rulematchdetails": "null",
"action": "BLOCK",
"ruleid": "NoUserAgent_HEADER"
},
"excludedrules":"null"
}
]
我想分成一列的数据是rulegrouplist[terminatingrule].ruleid
,它的值为NoUserAgent_HEADER
AWS提供useful information on querying nested Athena arrays,但我一直无法得到我想要的结果
我将此作为一个 AWS 问题,但由于 Athena 使用 SQL 查询,因此任何具有良好 SQL 技能的人都可能解决这个问题。
我不完全清楚你到底想要什么,但我假设你在 terminatingrule
不是 "null"
的数组元素之后(我也会假设如果有有多个你想要第一个)。
您 link 的文档表明 rulegrouplist
列的类型是 array<string>
。它是 string
而不是复杂类型的原因是因为该列似乎有多个不同的模式,一个例子是 terminatingrule
属性 是 string "null"
或 struct/object – 无法使用 Athena 的类型系统描述的内容。
但这不是问题。在处理 JSON 时,有一整套 JSON functions that can be used. Here's one way to use json_extract
combined with filter
and element_at
用于删除数组元素,其中 terminatingrule
属性 是字符串“null”,然后选择剩余元素中的第一个:
SELECT
element_at(
filter(
rulegrouplist,
rulegroup -> json_extract(rulegroup, '$.terminatingrule') <> CAST('null' AS JSON)
),
1
) AS first_non_null_terminatingrule
FROM waf_logs
WHERE action = 'BLOCK'
ORDER BY date DESC
你说你想要“最新”,这对我来说是模棱两可的,可能意味着第一个非空元素和最后一个非空元素。上面的查询将 return 第一个非空元素,如果你想要最后一个你可以将 element_at
的第二个参数更改为 -1 (Athena 的数组索引从 1 开始,-1 是计数从最后开始)。
至 return json 的单个 ruleid 元素:
SELECT from_unixtime(timestamp / 1000e0) AS date, action, httprequest.clientip AS ip, httprequest.uri AS request, httprequest.country as country, terminatingruleid, json_extract(element_at(filter(rulegrouplist,rulegroup -> json_extract(rulegroup, '$.terminatingrule') <> CAST('null' AS JSON) ),1), '$.terminatingrule.ruleid') AS ruleid
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
我遇到了同样的问题,但 Theo post 提供的解决方案对我不起作用,即使 table 是根据原始 [=16] 中链接的说明创建的=].
这是对我有用的方法,它与 Theo 的解决方案基本相同,但没有 json 转换:
SELECT
from_unixtime(timestamp / 1000e0) AS date,
action,
httprequest.clientip AS ip,
httprequest.uri AS request,
httprequest.country as country,
terminatingruleid,
rulegrouplist,
element_at(filter(ruleGroupList, ruleGroup -> ruleGroup.terminatingRule IS NOT NULL),1).terminatingRule.ruleId AS ruleId
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
LIMIT 100;
我按照 these instructions 将我的 AWS WAF 数据导入 Athena table。
我想查询数据以查找具有 BLOCK 操作的最新请求。此查询有效:
SELECT
from_unixtime(timestamp / 1000e0) AS date,
action,
httprequest.clientip AS ip,
httprequest.uri AS request,
httprequest.country as country,
terminatingruleid,
rulegrouplist
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
LIMIT 100;
我的问题是清楚地识别“终止规则”——请求被阻止的原因。例如,结果有
terminatingrule = AWS-AWSManagedRulesCommonRuleSet
和
rulegrouplist = [
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesAmazonIpReputationList",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesKnownBadInputsRuleSet",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesLinuxRuleSet",
"terminatingrule": "null",
"excludedrules": "null"
},
{
"nonterminatingmatchingrules": [],
"rulegroupid": "AWS#AWSManagedRulesCommonRuleSet",
"terminatingrule": {
"rulematchdetails": "null",
"action": "BLOCK",
"ruleid": "NoUserAgent_HEADER"
},
"excludedrules":"null"
}
]
我想分成一列的数据是rulegrouplist[terminatingrule].ruleid
,它的值为NoUserAgent_HEADER
AWS提供useful information on querying nested Athena arrays,但我一直无法得到我想要的结果
我将此作为一个 AWS 问题,但由于 Athena 使用 SQL 查询,因此任何具有良好 SQL 技能的人都可能解决这个问题。
我不完全清楚你到底想要什么,但我假设你在 terminatingrule
不是 "null"
的数组元素之后(我也会假设如果有有多个你想要第一个)。
您 link 的文档表明 rulegrouplist
列的类型是 array<string>
。它是 string
而不是复杂类型的原因是因为该列似乎有多个不同的模式,一个例子是 terminatingrule
属性 是 string "null"
或 struct/object – 无法使用 Athena 的类型系统描述的内容。
但这不是问题。在处理 JSON 时,有一整套 JSON functions that can be used. Here's one way to use json_extract
combined with filter
and element_at
用于删除数组元素,其中 terminatingrule
属性 是字符串“null”,然后选择剩余元素中的第一个:
SELECT
element_at(
filter(
rulegrouplist,
rulegroup -> json_extract(rulegroup, '$.terminatingrule') <> CAST('null' AS JSON)
),
1
) AS first_non_null_terminatingrule
FROM waf_logs
WHERE action = 'BLOCK'
ORDER BY date DESC
你说你想要“最新”,这对我来说是模棱两可的,可能意味着第一个非空元素和最后一个非空元素。上面的查询将 return 第一个非空元素,如果你想要最后一个你可以将 element_at
的第二个参数更改为 -1 (Athena 的数组索引从 1 开始,-1 是计数从最后开始)。
至 return json 的单个 ruleid 元素:
SELECT from_unixtime(timestamp / 1000e0) AS date, action, httprequest.clientip AS ip, httprequest.uri AS request, httprequest.country as country, terminatingruleid, json_extract(element_at(filter(rulegrouplist,rulegroup -> json_extract(rulegroup, '$.terminatingrule') <> CAST('null' AS JSON) ),1), '$.terminatingrule.ruleid') AS ruleid
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
我遇到了同样的问题,但 Theo post 提供的解决方案对我不起作用,即使 table 是根据原始 [=16] 中链接的说明创建的=].
这是对我有用的方法,它与 Theo 的解决方案基本相同,但没有 json 转换:
SELECT
from_unixtime(timestamp / 1000e0) AS date,
action,
httprequest.clientip AS ip,
httprequest.uri AS request,
httprequest.country as country,
terminatingruleid,
rulegrouplist,
element_at(filter(ruleGroupList, ruleGroup -> ruleGroup.terminatingRule IS NOT NULL),1).terminatingRule.ruleId AS ruleId
FROM waf_logs
WHERE action='BLOCK'
ORDER BY date DESC
LIMIT 100;