Bash 每十分钟计算一次方括号中 ID 的脚本
Bash script that counts IDs in square brackets every ten minutes
有这个日志文件
20180917084726:-
20180917085418:[111783178, 111557953, 111646835, 111413356, 111412662, 105618372, 111413557]
20180917115418:[111413432, 111633904, 111783198, 111792767, 111557948, 111413225, 111413281]
20180917105419:[111413432, 111633904, 111783198, 111792767, 111557948, 111413225, 111413281]
20180917085522:[111344871, 111394583, 111295547, 111379566, 111352520]
20180917090022:[111344871, 111394583, 111295547, 111379566, 111352520]
输入日志的格式为:
时间戳的格式为 YYYYMMDDhhmmss
我想知道如何编写一个脚本,为一天中的每十分钟片输出一行返回的唯一 ID 的计数
结果如下:
20180917084:0
20180917085:12
20180917115:7
20180917105:7
Perl 来拯救!
perl -ne '
($timestamp, @ids) = /([0-9]+)/g;
substr $timestamp, -3, 3, "";
@{ $seen{$timestamp} }{@ids} = ();
END {
for my $timestamp (sort keys %seen) {
print "$timestamp:", scalar keys %{ $seen{$timestamp} }, "\n";
}
}' < file.log
awk:使用冒号或逗号作为字段分隔符。
awk -F '[,:]' '
{
key = substr(,1,11)"0"
count[key] += ( == "-" ? 0 : NF-1)
}
END {
PROCINFO["sorted_in"] = "@ind_num_asc"
for (key in count) print key, count[key]
}
' file
201809170840 0
201809170850 12
201809170900 5
201809171050 7
201809171150 7
要过滤今天的日期,您可以说:
gawk -F '[,:]' '
BEGIN {today = strftimme("%Y%m%d", systime())}
[=12=] ~ "^"today { key = ...
或
awk -F '[,:]' -v "today=$(date "+%Y%m%d")" '
[=13=] ~ "^"today { key = ...
或将现有的 awk 代码通过管道传输到 | grep "^$(date +%Y%m%d)"
能否请您尝试以下操作,它会按照 Input_file.
中发生时间戳的相同顺序为您提供输出
awk '
{
val=substr([=10=],1,11)
}
!a[val]++{
b[++count]=val
}
match([=10=],/\[.*\]/){
num=split(substr([=10=],RSTART,RLENGTH),array,",")
c[val]+=num
}
END{
for(i=1;i<=count;i++){
print b[i],c[b[i]]+0
}
}' Input_file
输出如下。
20180917084 0
20180917085 12
20180917115 7
20180917105 7
20180917090 5
编辑: 添加解决方案,以防您的任何字段具有 NULL 值,因此现在也检查上面的代码.
awk '
{
val=substr([=12=],1,11)
}
!a[val]++{
b[++count]=val
}
match([=12=],/\[.*\]/){
count1=""
num=split(substr([=12=],RSTART,RLENGTH),array,",")
for(j=1;j<=num;j++){
if(array[j]){
count1++
}
}
c[val]+=count1
}
END{
for(i=1;i<=count;i++){
print b[i],c[b[i]]+0
}
}' Input_file
你的输入和输出不一致,但我猜你想要这样的东西
$ awk -F: '{k=sprintf("%10d",/1000); n=gsub(",",",",); a[k]+=(n?n+1:n)}
END {for(k in a) print k":"a[k] | "sort" }' file
20180917084:0
20180917085:12
20180917090:5
20180917105:7
20180917115:7
有这个日志文件
20180917084726:-
20180917085418:[111783178, 111557953, 111646835, 111413356, 111412662, 105618372, 111413557]
20180917115418:[111413432, 111633904, 111783198, 111792767, 111557948, 111413225, 111413281]
20180917105419:[111413432, 111633904, 111783198, 111792767, 111557948, 111413225, 111413281]
20180917085522:[111344871, 111394583, 111295547, 111379566, 111352520]
20180917090022:[111344871, 111394583, 111295547, 111379566, 111352520]
输入日志的格式为:
时间戳的格式为 YYYYMMDDhhmmss
我想知道如何编写一个脚本,为一天中的每十分钟片输出一行返回的唯一 ID 的计数
结果如下:
20180917084:0
20180917085:12
20180917115:7
20180917105:7
Perl 来拯救!
perl -ne '
($timestamp, @ids) = /([0-9]+)/g;
substr $timestamp, -3, 3, "";
@{ $seen{$timestamp} }{@ids} = ();
END {
for my $timestamp (sort keys %seen) {
print "$timestamp:", scalar keys %{ $seen{$timestamp} }, "\n";
}
}' < file.log
awk:使用冒号或逗号作为字段分隔符。
awk -F '[,:]' '
{
key = substr(,1,11)"0"
count[key] += ( == "-" ? 0 : NF-1)
}
END {
PROCINFO["sorted_in"] = "@ind_num_asc"
for (key in count) print key, count[key]
}
' file
201809170840 0
201809170850 12
201809170900 5
201809171050 7
201809171150 7
要过滤今天的日期,您可以说:
gawk -F '[,:]' '
BEGIN {today = strftimme("%Y%m%d", systime())}
[=12=] ~ "^"today { key = ...
或
awk -F '[,:]' -v "today=$(date "+%Y%m%d")" '
[=13=] ~ "^"today { key = ...
或将现有的 awk 代码通过管道传输到 | grep "^$(date +%Y%m%d)"
能否请您尝试以下操作,它会按照 Input_file.
中发生时间戳的相同顺序为您提供输出awk '
{
val=substr([=10=],1,11)
}
!a[val]++{
b[++count]=val
}
match([=10=],/\[.*\]/){
num=split(substr([=10=],RSTART,RLENGTH),array,",")
c[val]+=num
}
END{
for(i=1;i<=count;i++){
print b[i],c[b[i]]+0
}
}' Input_file
输出如下。
20180917084 0
20180917085 12
20180917115 7
20180917105 7
20180917090 5
编辑: 添加解决方案,以防您的任何字段具有 NULL 值,因此现在也检查上面的代码.
awk '
{
val=substr([=12=],1,11)
}
!a[val]++{
b[++count]=val
}
match([=12=],/\[.*\]/){
count1=""
num=split(substr([=12=],RSTART,RLENGTH),array,",")
for(j=1;j<=num;j++){
if(array[j]){
count1++
}
}
c[val]+=count1
}
END{
for(i=1;i<=count;i++){
print b[i],c[b[i]]+0
}
}' Input_file
你的输入和输出不一致,但我猜你想要这样的东西
$ awk -F: '{k=sprintf("%10d",/1000); n=gsub(",",",",); a[k]+=(n?n+1:n)}
END {for(k in a) print k":"a[k] | "sort" }' file
20180917084:0
20180917085:12
20180917090:5
20180917105:7
20180917115:7