使用正则表达式拆分元类型数据

Split Meta Type Data With A Regex

我有一个数组存储了这样的数据:

<WebPage>
<Action>Action Goes Here 1</Action>
<SystemData>SystemData Goes Here 1</SystemData>
<PageSatausData>PageSatausData Goes Here 1</PageSatausData>
<PageNameData>PageNameData Goes Here 1</PageNameData>
<TitleData>TitleData Goes Here 1</TitleData>
<KeywordData>KeywordData Goes Here 1</KeywordData>
<DescriptionData>DescriptionData Goes Here 1</DescriptionData>
<HeaderData>HeaderData Goes Here 1</HeaderData>
<BodyData>BodyData Goes Here 1</BodyData>
<FooterData>FooterData Goes Here 1</FooterData>
</WebPage>
<WebPage>
<Action>Action Goes Here 2</Action>
<SystemData>SystemData Goes Here 2</SystemData>
<PageSatausData>PageSatausData Goes Here 2</PageSatausData>
<PageNameData>PageNameData Goes Here 2</PageNameData>
<TitleData>TitleData Goes Here 2</TitleData>
<KeywordData>KeywordData Goes Here 2</KeywordData>
<DescriptionData>DescriptionData Goes Here 2</DescriptionData>
<HeaderData>HeaderData Goes Here 2</HeaderData>
<BodyData>BodyData Goes Here 2</BodyData>
<FooterData>FooterData Goes Here 2</FooterData>
</WebPage>

我想做的是循环它并为每个值分配变量,如下所示:

foreach my $Line (@Meta_Content) {

my($Var1,$Var2,$Var3,$Var4,$Var5,$Var6,$Var7,$Var8,$Var9,$Var10) = split (/\>\</,$Line,10);

print "Result: $Var1,$Var2,$Var3,$Var4,$Var5,$Var6,$Var7,$Var8,$Var9,$Var10<br>";
 }

不幸的是,我知道 XML 模块,但在这种情况下,我需要一个正则表达式来执行,所以模块不是一个选项。

这里是

#!/usr/bin/perl
use strict; use warnings; use Data::Dumper;
my $hash;


while (<DATA>) {

    if ( /<WebPage>/ ) {
    $hash={} 
    }
    elsif  ( /<\/WebPage>/ ) {
    print Dumper $hash
    }
    elsif ( /^<(.+)>(.+)<\/>\s*/ ) {
    $hash->{}=      
    }
}

__DATA__
<WebPage>
<Action>Action Goes Here 1</Action>
<SystemData>SystemData Goes Here 1</SystemData>
<PageSatausData>PageSatausData Goes Here 1</PageSatausData>
<PageNameData>PageNameData Goes Here 1</PageNameData>
<TitleData>TitleData Goes Here 1</TitleData>
<KeywordData>KeywordData Goes Here 1</KeywordData>
<DescriptionData>DescriptionData Goes Here 1</DescriptionData>
<HeaderData>HeaderData Goes Here 1</HeaderData>
<BodyData>BodyData Goes Here 1</BodyData>
<FooterData>FooterData Goes Here 1</FooterData>
</WebPage>
<WebPage>
<Action>Action Goes Here 2</Action>
<SystemData>SystemData Goes Here 2</SystemData>
<PageSatausData>PageSatausData Goes Here 2</PageSatausData>
<PageNameData>PageNameData Goes Here 2</PageNameData>
<TitleData>TitleData Goes Here 2</TitleData>
<KeywordData>KeywordData Goes Here 2</KeywordData>
<DescriptionData>DescriptionData Goes Here 2</DescriptionData>
<HeaderData>HeaderData Goes Here 2</HeaderData>
<BodyData>BodyData Goes Here 2</BodyData>
<FooterData>FooterData Goes Here 2</FooterData>
</WebPage>

而不是这个:

my($Var1,$Var2,$Var3,$Var4,$Var5,$Var6,$Var7,$Var8,$Var9,$Var10) = split (/\>\</,$Line,10);

print "Result: $Var1,$Var2,$Var3,$Var4,$Var5,$Var6,$Var7,$Var8,$Var9,$Var10<br>";
 }

你可以这样写:

my @pieces = split (/\>\</,$Line,10);
my $str = join '', @pieces;
print "Results: $str <br>";

而如果你需要引用个别项目,而不是写$var1,你可以写$pieces[0];而不是写 $var2,你可以写 $pieces[1],等等

看看这有多简洁?每种语言的初学者都可以尝试您所做的。规则是:如果您发现自己编写的变量名仅相差一个数字,那么您应该将数据存储在数组中。