用于处理文本文件的 Perl 脚本

Perl script to process a text file

我有一个格式如下的文件最好:

Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_Identifier = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_Node = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580

理想情况下,每个数据集都以“Status_”行开始,以“RawCaptureTimeStamp”行结束,由 2 个新行分隔。

现在的问题是在非理想情况下,文件可能如下所示:

1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"

如上所示,第一个和最后一个数据集无效。我需要一种逻辑,我可以在其中从原始文件中删除这些不需要的数据集并重新保存。 我在 PERL 中尝试过几件事,但都失败了。请帮忙。 我用来读取文件并检查文件是否以状态开头的代码,如果没有读取,直到它到达 rawcapturetimestamp。

while( my $line = <$cap_1>){
    if($. == 1 && $line !~ /^Status/){ #check if first line doesn't begin with status
            while($line = <$cap_1>){#if not read till the occurence of RawCaptureTimeStamp
            if($line =~/^RawCaptureTimeStamp/){
                $. = $.+1;
                last;
            }
        }
        $line = <$cap_1>; 
        if (eof()){ #After reading till raw capture timestamp, check for EOF
            last;
        }
    }
}
#! /usr/bin/perl
use warnings;
use strict;

$_ = q();
$_ = <> until /^Status_/; # Skip the invalid beginning;

my $block = $_;

while (<>) {
    if (/^RawCaptureTimeStamp/) {  # End of block: print it, start gathering a new one.
        print $block, $_;
        $block = q();

    } else {                       # Inside of a block.
        $block .= $_;
    }
}

如果没有正确结束,最后一个块将不会被打印。

这有效,我相信:

#!/usr/bin/env perl
use strict;
use warnings;

$/ = "\n\n";

while (<>)
{
    s/^\s+//;
    s/\s+$//;
    print "\n[[", $_, "]]\n"
        if (m/^Status_\w+ .*Status_\w+ /ms && m/^RawCaptureTimeStamp /m);
}

设置 $/ 最多读取一个双换行符(或 EOF),有效地一次读取一个段落。 if 条件查找两个 Status_ 元素和一个 RawCaptureTimeStamp;您可以根据需要改进这些条件,使它们更加严格。 s 修饰符允许 .* 匹配嵌入的换行符; m 修饰符用于多行模式。例如,RawCaptureTimeStamp 后跟其他行就可以了。

示例数据,从问题中复制:

Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_Identifier = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_Node = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"

示例输出:

[[Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]

[[Status_Identifier = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]

[[Status_Node = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]

[[Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]

使用Perl段落模式,如所述here

#!/usr/bin/perl -w

use strict;

local $/ = "";

while (my $para = <DATA>) {
    print $para if ($para =~ /^Status_.*RawCaptureTimeStamp/s);
}

__DATA__
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"

我会以段落模式阅读文件(将 $/ 设置为 "" 而不是 "\n\n" 关于你的问题) 并检查每个段落的一致性

必须在每个块的末尾替换三个换行符,因为在此模式下 PerlIO 将它们标准化为两个

看起来问题是数据可能在两端被截断,所以我要求时间戳为十位数字,涵盖从 2001 年到 2286 年的日期

use strict;
use warnings 'all';

local $/ = ''; # Separate reads by one or more blank lines

while ( <> ) {

    next unless /^Status.+\nStatus/ and /^RawCaptureTimeStamp = \d{10}/m;
    s/\s*\z/\n\n\n/;

    print;
}

输出(使用错误的示例数据集)

Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580