如何从样本列表中搜索第一个和最后一个出现的事件
How to search the first and last occurrence from the sample list
我的应用程序有一个日志,如下所示:
{Fri Mar 16 19:07:47 Program: job-a: <blah><blah>
Fri Mar 16 19:07:47 Program: job-a: <blah><blah>
Fri Mar 16 19:07:48 Program: job-b: <blah><blah>
Fri Mar 16 19:07:48 Program: job-b: <blah><blah>
Fri Mar 16 19:07:50 Program: job-b: <blah><blah>
Fri Mar 16 19:07:51 Program: job-b: <blah><blah>
Fri Mar 16 19:07:52 Program: job-a: <blah><blah>
Fri Mar 16 19:07:52 Program: job-a: <blah><blah>
Fri Mar 16 19:07:53 Program: job-a: <blah><blah>
Fri Mar 16 19:07:54 Program: job-a: <blah><blah>
Fri Mar 16 19:07:55 Program: job-a: <blah><blah>
Fri Mar 16 19:08:00 Program: job-a: <blah><blah>
Fri Mar 16 19:08:01 Program: job-a: <blah><blah>
Fri Mar 16 20:33:52 Program: job-c: <blah><blah>
Fri Mar 16 20:45:56 Program: job-c: <blah><blah>}
在这种情况下,对于每个作业名称 (job-a
、job-b
、job-c
),我需要找到该行的第一个和最后一个出现位置以标识开始和结束次。
即我需要输出 program/job 名称、start_time 和 end_time,如下面的示例输出所示。我已将预期输出显示为逗号分隔,但我并不真正关心分隔符,因为我只对值感兴趣。忽略示例中开头和结尾的花括号 input/output.
job-a, Fri Mar 16 19:07:47, Fri Mar 16 19:08:01
job-b, Fri Mar 16 19:07:48, Fri Mar 16 19:07:51
job-c, Fri Mar 16 20:33:52, Fri Mar 16 20:45:56
这是 Perl 中的示例:
use feature qw(say);
use strict;
use warnings;
my $fn = 'log.txt';
open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
my %jobs;
while (my $line = <$fh>) {
chomp $line;
next if $line !~ /job/;
my ($date, $job ) = $line =~ /^(.*?)\s*Program:\s*(job.*?):/;
if (exists $jobs{$job}) {
$jobs{$job}->{end} = $date;
}
else {
$jobs{$job}->{start} = $date;
}
}
close $fh;
for my $job (sort keys %jobs) {
say join ", ", $job, $jobs{$job}->{start}, $jobs{$job}->{end};
}
您可以使用 awk
,这里我只是展示如何获取每个作业的第一次和最后一次出现。
awk '!first[]{ first[]= } { last[]= }
END{ for (x in last) print x, first[x], last[x] }' OFS=', ' infile
job-a:, 19:07:47, 19:08:01
job-b:, 19:07:48, 19:07:51
job-c:, 20:33:52, 20:45:56
我的应用程序有一个日志,如下所示:
{Fri Mar 16 19:07:47 Program: job-a: <blah><blah>
Fri Mar 16 19:07:47 Program: job-a: <blah><blah>
Fri Mar 16 19:07:48 Program: job-b: <blah><blah>
Fri Mar 16 19:07:48 Program: job-b: <blah><blah>
Fri Mar 16 19:07:50 Program: job-b: <blah><blah>
Fri Mar 16 19:07:51 Program: job-b: <blah><blah>
Fri Mar 16 19:07:52 Program: job-a: <blah><blah>
Fri Mar 16 19:07:52 Program: job-a: <blah><blah>
Fri Mar 16 19:07:53 Program: job-a: <blah><blah>
Fri Mar 16 19:07:54 Program: job-a: <blah><blah>
Fri Mar 16 19:07:55 Program: job-a: <blah><blah>
Fri Mar 16 19:08:00 Program: job-a: <blah><blah>
Fri Mar 16 19:08:01 Program: job-a: <blah><blah>
Fri Mar 16 20:33:52 Program: job-c: <blah><blah>
Fri Mar 16 20:45:56 Program: job-c: <blah><blah>}
在这种情况下,对于每个作业名称 (job-a
、job-b
、job-c
),我需要找到该行的第一个和最后一个出现位置以标识开始和结束次。
即我需要输出 program/job 名称、start_time 和 end_time,如下面的示例输出所示。我已将预期输出显示为逗号分隔,但我并不真正关心分隔符,因为我只对值感兴趣。忽略示例中开头和结尾的花括号 input/output.
job-a, Fri Mar 16 19:07:47, Fri Mar 16 19:08:01
job-b, Fri Mar 16 19:07:48, Fri Mar 16 19:07:51
job-c, Fri Mar 16 20:33:52, Fri Mar 16 20:45:56
这是 Perl 中的示例:
use feature qw(say);
use strict;
use warnings;
my $fn = 'log.txt';
open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
my %jobs;
while (my $line = <$fh>) {
chomp $line;
next if $line !~ /job/;
my ($date, $job ) = $line =~ /^(.*?)\s*Program:\s*(job.*?):/;
if (exists $jobs{$job}) {
$jobs{$job}->{end} = $date;
}
else {
$jobs{$job}->{start} = $date;
}
}
close $fh;
for my $job (sort keys %jobs) {
say join ", ", $job, $jobs{$job}->{start}, $jobs{$job}->{end};
}
您可以使用 awk
,这里我只是展示如何获取每个作业的第一次和最后一次出现。
awk '!first[]{ first[]= } { last[]= }
END{ for (x in last) print x, first[x], last[x] }' OFS=', ' infile
job-a:, 19:07:47, 19:08:01
job-b:, 19:07:48, 19:07:51
job-c:, 20:33:52, 20:45:56