用于处理文本文件的 Perl 脚本
Perl script to process a text file
我有一个格式如下的文件最好:
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_Identifier = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_Node = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
理想情况下,每个数据集都以“Status_”行开始,以“RawCaptureTimeStamp”行结束,由 2 个新行分隔。
现在的问题是在非理想情况下,文件可能如下所示:
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
如上所示,第一个和最后一个数据集无效。我需要一种逻辑,我可以在其中从原始文件中删除这些不需要的数据集并重新保存。
我在 PERL 中尝试过几件事,但都失败了。请帮忙。
我用来读取文件并检查文件是否以状态开头的代码,如果没有读取,直到它到达 rawcapturetimestamp。
while( my $line = <$cap_1>){
if($. == 1 && $line !~ /^Status/){ #check if first line doesn't begin with status
while($line = <$cap_1>){#if not read till the occurence of RawCaptureTimeStamp
if($line =~/^RawCaptureTimeStamp/){
$. = $.+1;
last;
}
}
$line = <$cap_1>;
if (eof()){ #After reading till raw capture timestamp, check for EOF
last;
}
}
}
#! /usr/bin/perl
use warnings;
use strict;
$_ = q();
$_ = <> until /^Status_/; # Skip the invalid beginning;
my $block = $_;
while (<>) {
if (/^RawCaptureTimeStamp/) { # End of block: print it, start gathering a new one.
print $block, $_;
$block = q();
} else { # Inside of a block.
$block .= $_;
}
}
如果没有正确结束,最后一个块将不会被打印。
这有效,我相信:
#!/usr/bin/env perl
use strict;
use warnings;
$/ = "\n\n";
while (<>)
{
s/^\s+//;
s/\s+$//;
print "\n[[", $_, "]]\n"
if (m/^Status_\w+ .*Status_\w+ /ms && m/^RawCaptureTimeStamp /m);
}
设置 $/
最多读取一个双换行符(或 EOF),有效地一次读取一个段落。 if
条件查找两个 Status_
元素和一个 RawCaptureTimeStamp
;您可以根据需要改进这些条件,使它们更加严格。 s
修饰符允许 .*
匹配嵌入的换行符; m
修饰符用于多行模式。例如,RawCaptureTimeStamp
后跟其他行就可以了。
示例数据,从问题中复制:
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_Identifier = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_Node = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
示例输出:
[[Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]
[[Status_Identifier = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]
[[Status_Node = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]
[[Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]
使用Perl
段落模式,如所述here
#!/usr/bin/perl -w
use strict;
local $/ = "";
while (my $para = <DATA>) {
print $para if ($para =~ /^Status_.*RawCaptureTimeStamp/s);
}
__DATA__
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
我会以段落模式阅读文件(将 $/
设置为 ""
而不是 "\n\n"
关于你的问题)
并检查每个段落的一致性
必须在每个块的末尾替换三个换行符,因为在此模式下 PerlIO 将它们标准化为两个
看起来问题是数据可能在两端被截断,所以我要求时间戳为十位数字,涵盖从 2001 年到 2286 年的日期
use strict;
use warnings 'all';
local $/ = ''; # Separate reads by one or more blank lines
while ( <> ) {
next unless /^Status.+\nStatus/ and /^RawCaptureTimeStamp = \d{10}/m;
s/\s*\z/\n\n\n/;
print;
}
输出(使用错误的示例数据集)
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
我有一个格式如下的文件最好:
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_Identifier = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_Node = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
理想情况下,每个数据集都以“Status_”行开始,以“RawCaptureTimeStamp”行结束,由 2 个新行分隔。
现在的问题是在非理想情况下,文件可能如下所示:
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
如上所示,第一个和最后一个数据集无效。我需要一种逻辑,我可以在其中从原始文件中删除这些不需要的数据集并重新保存。 我在 PERL 中尝试过几件事,但都失败了。请帮忙。 我用来读取文件并检查文件是否以状态开头的代码,如果没有读取,直到它到达 rawcapturetimestamp。
while( my $line = <$cap_1>){
if($. == 1 && $line !~ /^Status/){ #check if first line doesn't begin with status
while($line = <$cap_1>){#if not read till the occurence of RawCaptureTimeStamp
if($line =~/^RawCaptureTimeStamp/){
$. = $.+1;
last;
}
}
$line = <$cap_1>;
if (eof()){ #After reading till raw capture timestamp, check for EOF
last;
}
}
}
#! /usr/bin/perl
use warnings;
use strict;
$_ = q();
$_ = <> until /^Status_/; # Skip the invalid beginning;
my $block = $_;
while (<>) {
if (/^RawCaptureTimeStamp/) { # End of block: print it, start gathering a new one.
print $block, $_;
$block = q();
} else { # Inside of a block.
$block .= $_;
}
}
如果没有正确结束,最后一个块将不会被打印。
这有效,我相信:
#!/usr/bin/env perl
use strict;
use warnings;
$/ = "\n\n";
while (<>)
{
s/^\s+//;
s/\s+$//;
print "\n[[", $_, "]]\n"
if (m/^Status_\w+ .*Status_\w+ /ms && m/^RawCaptureTimeStamp /m);
}
设置 $/
最多读取一个双换行符(或 EOF),有效地一次读取一个段落。 if
条件查找两个 Status_
元素和一个 RawCaptureTimeStamp
;您可以根据需要改进这些条件,使它们更加严格。 s
修饰符允许 .*
匹配嵌入的换行符; m
修饰符用于多行模式。例如,RawCaptureTimeStamp
后跟其他行就可以了。
示例数据,从问题中复制:
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_Identifier = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_Node = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
示例输出:
[[Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]
[[Status_Identifier = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]
[[Status_Node = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]
[[Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]
使用Perl
段落模式,如所述here
#!/usr/bin/perl -w
use strict;
local $/ = "";
while (my $para = <DATA>) {
print $para if ($para =~ /^Status_.*RawCaptureTimeStamp/s);
}
__DATA__
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
我会以段落模式阅读文件(将 $/
设置为 ""
而不是 "\n\n"
必须在每个块的末尾替换三个换行符,因为在此模式下 PerlIO 将它们标准化为两个
看起来问题是数据可能在两端被截断,所以我要求时间戳为十位数字,涵盖从 2001 年到 2286 年的日期
use strict;
use warnings 'all';
local $/ = ''; # Separate reads by one or more blank lines
while ( <> ) {
next unless /^Status.+\nStatus/ and /^RawCaptureTimeStamp = \d{10}/m;
s/\s*\z/\n\n\n/;
print;
}
输出(使用错误的示例数据集)
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580