使用 jq 进行多层次计数
Use jq to count on multiple levels
我们发现了一些与感染有关的域名。现在我们在 .json 文件中有一个 DNS 名称列表,我想生成一个摘要输出显示:用户列表、他们访问的唯一域、总数。如果我还可以获得每个域名的计数,则可加分。
这是文件示例:
{"machine": "possible_victim01", "domain": "evil.com", "timestamp":1435071870}
{"machine": "possible_victim01", "domain": "evil.com", "timestamp":1435071875}
{"machine": "possible_victim01", "domain": "soevil.com", "timestamp":1435071877}
{"machine": "possible_victim02", "domain": "bad.com", "timestamp":1435071877}
{"machine": "possible_victim03", "domain": "soevil.com", "timestamp":1435071879}
理想情况下,我希望输出类似于:
{"possible_victim01": "total": 3, {"evil.com": 2, "soevil.com": 1}}
{"possible_victim02": "total": 1, {"bad.com": 1}}
{"possible_victim03": "total": 1, {"soevil.com": 1}}
我很乐意接受:
{"possible_victim01": "total": 3, ["evil.com", "soevil.com"]}
{"possible_victim02": "total": 1, ["bad.com"]}
{"possible_victim03": "total": 1, ["soevil.com"]}
我可以获得每个用户的记录总数,但我丢失了域列表:
cat sample.json | jq -s 'group_by(.machine) | map({machine:.[0].machine,domain:.[0].domain, count:length}) '
[{"machine": "possible_victim01", "domain": "evil.com", "count": 3},
{"machine": "possible_victim02", "domain": "bad.com", "count": 1},
{"machine": "possible_victim03", "domain": "soevil.com", "count": 1}]
这篇post描述了如何解决问题的后半部分...。我还没有找到任何描述上半部分的内容,到达:
{"machine": "possible_victim01", "domain": "evil.com", "count":2}
{"machine": "possible_victim01", "domain": "soevil.com", "count":1}
{"machine": "possible_victim02", "domain": "bad.com", "count":1}
{"machine": "possible_victim03", "domain": "soevil.com", "count":1}
您需要执行 group_by
两次,一次按机器名称分组,然后进行子分组以获得每个域的子计数。
jq查询:
group_by(.machine) | map({
"machine": .[0].machine,
"total":length,
"domains": (group_by(.domain) | map({
"key":.[0].domain,
"value":length}) | from_entries
)
})
示例输出:
{
"machine": "possible_victim01",
"total": 3,
"domains": {
"evil.com": 2,
"soevil.com": 1
}
}
{
"machine": "possible_victim02",
"total": 1,
"domains": {
"bad.com": 1
}
}
{
"machine": "possible_victim03",
"total": 1,
"domains": {
"soevil.com": 1
}
}
按照描述的方式使用 group_by 没问题,但是如果您有
按照建议阅读大量行(即 JSON 个实体)
根据提供的示例,那么您可能 运行 遇到性能问题
and/or容量限制。
使用 "inputs" 内置的 jq 的任何版本(例如 jq 1.5rc1)都可以非常有效地解决这些问题。
请注意,使用 "inputs" 您将使用 -n 选项调用 jq,如下所示:
jq -n -f program.jq data.json
另请注意,这里最好生成 JSON 输出,下面的内容似乎接近所需内容:
{"possible_victim01": { "total": 3, "evildoers": {"evil.com": 2, "soevil.com": 1} },
"possible_victim02": ...}`
下面的程序可以做得更简洁但是
此处的介绍旨在使过程透明,
假设对 jq 有基本的了解。如果这里有魔法,
这是一个不必做 "null".
的特例
reduce inputs as $line
({};
. as $in
| ($line.machine) as $machine
| ($line.domain) as $domain
| ($in[$machine].evildoers ) as $evildoers
| . + { ($machine): {"total": (1 + $in[$machine]["total"]),
"evildoers": ($evildoers | (.[$domain] += 1)) }} )
使用提供的示例输入,输出为:
{
"possible_victim01": {
"total": 3,
"evildoers": {
"evil.com": 2,
"soevil.com": 1
}
},
"possible_victim02": {
"total": 1,
"evildoers": {
"bad.com": 1
}
},
"possible_victim03": {
"total": 1,
"evildoers": {
"soevil.com": 1
}
}
}
这是一个使用reduce, getpath and setpath
的解决方案
reduce .[] as $o (
{}
; [$o.machine, "total"] as $p1
| [$o.machine, "domains", $o.domain] as $p2
| setpath($p1; 1+getpath($p1))
| setpath($p2; 1+getpath($p2))
)
如果 filter.jq
包含此过滤器并且 data.json
包含示例数据,则命令
$ jq -M -s -f filter.jq data.json
生产
{
"possible_victim01": {
"total": 3,
"domains": {
"evil.com": 2,
"soevil.com": 1
}
},
"possible_victim02": {
"total": 1,
"domains": {
"bad.com": 1
}
},
"possible_victim03": {
"total": 1,
"domains": {
"soevil.com": 1
}
}
}
我们发现了一些与感染有关的域名。现在我们在 .json 文件中有一个 DNS 名称列表,我想生成一个摘要输出显示:用户列表、他们访问的唯一域、总数。如果我还可以获得每个域名的计数,则可加分。
这是文件示例:
{"machine": "possible_victim01", "domain": "evil.com", "timestamp":1435071870}
{"machine": "possible_victim01", "domain": "evil.com", "timestamp":1435071875}
{"machine": "possible_victim01", "domain": "soevil.com", "timestamp":1435071877}
{"machine": "possible_victim02", "domain": "bad.com", "timestamp":1435071877}
{"machine": "possible_victim03", "domain": "soevil.com", "timestamp":1435071879}
理想情况下,我希望输出类似于:
{"possible_victim01": "total": 3, {"evil.com": 2, "soevil.com": 1}}
{"possible_victim02": "total": 1, {"bad.com": 1}}
{"possible_victim03": "total": 1, {"soevil.com": 1}}
我很乐意接受:
{"possible_victim01": "total": 3, ["evil.com", "soevil.com"]}
{"possible_victim02": "total": 1, ["bad.com"]}
{"possible_victim03": "total": 1, ["soevil.com"]}
我可以获得每个用户的记录总数,但我丢失了域列表:
cat sample.json | jq -s 'group_by(.machine) | map({machine:.[0].machine,domain:.[0].domain, count:length}) '
[{"machine": "possible_victim01", "domain": "evil.com", "count": 3},
{"machine": "possible_victim02", "domain": "bad.com", "count": 1},
{"machine": "possible_victim03", "domain": "soevil.com", "count": 1}]
这篇post描述了如何解决问题的后半部分...
{"machine": "possible_victim01", "domain": "evil.com", "count":2}
{"machine": "possible_victim01", "domain": "soevil.com", "count":1}
{"machine": "possible_victim02", "domain": "bad.com", "count":1}
{"machine": "possible_victim03", "domain": "soevil.com", "count":1}
您需要执行 group_by
两次,一次按机器名称分组,然后进行子分组以获得每个域的子计数。
jq查询:
group_by(.machine) | map({
"machine": .[0].machine,
"total":length,
"domains": (group_by(.domain) | map({
"key":.[0].domain,
"value":length}) | from_entries
)
})
示例输出:
{
"machine": "possible_victim01",
"total": 3,
"domains": {
"evil.com": 2,
"soevil.com": 1
}
}
{
"machine": "possible_victim02",
"total": 1,
"domains": {
"bad.com": 1
}
}
{
"machine": "possible_victim03",
"total": 1,
"domains": {
"soevil.com": 1
}
}
按照描述的方式使用 group_by 没问题,但是如果您有 按照建议阅读大量行(即 JSON 个实体) 根据提供的示例,那么您可能 运行 遇到性能问题 and/or容量限制。
使用 "inputs" 内置的 jq 的任何版本(例如 jq 1.5rc1)都可以非常有效地解决这些问题。
请注意,使用 "inputs" 您将使用 -n 选项调用 jq,如下所示:
jq -n -f program.jq data.json
另请注意,这里最好生成 JSON 输出,下面的内容似乎接近所需内容:
{"possible_victim01": { "total": 3, "evildoers": {"evil.com": 2, "soevil.com": 1} },
"possible_victim02": ...}`
下面的程序可以做得更简洁但是 此处的介绍旨在使过程透明, 假设对 jq 有基本的了解。如果这里有魔法, 这是一个不必做 "null".
的特例reduce inputs as $line
({};
. as $in
| ($line.machine) as $machine
| ($line.domain) as $domain
| ($in[$machine].evildoers ) as $evildoers
| . + { ($machine): {"total": (1 + $in[$machine]["total"]),
"evildoers": ($evildoers | (.[$domain] += 1)) }} )
使用提供的示例输入,输出为:
{
"possible_victim01": {
"total": 3,
"evildoers": {
"evil.com": 2,
"soevil.com": 1
}
},
"possible_victim02": {
"total": 1,
"evildoers": {
"bad.com": 1
}
},
"possible_victim03": {
"total": 1,
"evildoers": {
"soevil.com": 1
}
}
}
这是一个使用reduce, getpath and setpath
的解决方案reduce .[] as $o (
{}
; [$o.machine, "total"] as $p1
| [$o.machine, "domains", $o.domain] as $p2
| setpath($p1; 1+getpath($p1))
| setpath($p2; 1+getpath($p2))
)
如果 filter.jq
包含此过滤器并且 data.json
包含示例数据,则命令
$ jq -M -s -f filter.jq data.json
生产
{
"possible_victim01": {
"total": 3,
"domains": {
"evil.com": 2,
"soevil.com": 1
}
},
"possible_victim02": {
"total": 1,
"domains": {
"bad.com": 1
}
},
"possible_victim03": {
"total": 1,
"domains": {
"soevil.com": 1
}
}
}