在 perl 中解析 xml
Parsing xml in perl
我想使用 Perl 解析这个 xml。我在这里展示的 XML 只是更大的嵌套 XML 的一部分。我尝试过使用普通的解析器,其中大多数以难以读取和访问子节点的哈希格式提供输出。
我想获取元素并读取所有属性值。
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<TR name="App.exe" total="573" errors="1" failures="2" not-run="4" inconclusive="2" ignored="4" skipped="0" invalid="0" date="2015-01-12" time="17:43:59">
<environment version="2" cversion="44" os-version="Microsoft" platform="Win32NT" cwd="" machine-name="" user="me" user-domain="domain" />
<culture-info current-culture="en-US" current-uiculture="en-US" />
<TS type="Assembly" name="App.exe" executed="True" result="Failure" success="False" time="22" asserts="0">
<RS>
<TS type="Namespace" name="MyAPP" executed="True" result="Failure" success="False" time="2335.164" asserts="0">
<RS>
<TS type="Namespace" name="Project" executed="True" result="Failure" success="False" time="2335.164" asserts="0">
<RS>
<TS type="Namespace" name="Website" executed="True" result="Failure" success="False" time="2335.164" asserts="0">
<RS>
<TS type="Namespace" name="Service" executed="True" result="Failure" success="False" time="2335.163" asserts="0">
<RS>
<TS type="SetUpFixture" name="Tests" executed="True" result="Failure" success="False" time="2335.163" asserts="0">
<RS>
<TS type="Namespace" name="tempt" executed="True" result="Success" success="True" time="8.935" asserts="0">
<RS>
<TS type="ParameterizedFixture" name="TempAPI" executed="True" result="Success" success="True" time="8.935" asserts="0">
<RS>
<TS type="TestFixture" name="Admin" executed="True" result="Success" success="True" time="3.306" asserts="2">
<RS>
<TC name="testName1" executed="True" result="Success" success="True" time="0.352" asserts="0" />
<TC name="testName2" executed="True" result="Success" success="True" time="0.005" asserts="0" />
</RS>
</TS>
<TS type="TestFixture" name="Client" executed="True" result="Success" success="True" time="2.620" asserts="1">
<RS>
<TC name="testName3" executed="True" result="Success" success="True" time="0.319" asserts="0" />
<TC name="testName4" executed="True" result="Success" success="True" time="0.000" asserts="0" />
</RS>
</TS>
<TS type="TestFixture" name="Employee" executed="True" result="Success" success="True" time="3.007" asserts="1">
<RS>
<TC name="testName5" executed="True" result="Success" success="True" time="0.290" asserts="0" />
<TC name="testName6" executed="True" result="Success" success="True" time="0.000" asserts="0" />
</RS>
</TS>
</RS>
</TS>
</RS>
</TS>
</RS>
</TS>
</RS>
</TS>
</RS>
</TS>
</RS>
</TS>
</RS>
</TS>
</RS>
</TS>
</TR>
我试过这样做,正如我所说,这将提供难以读取和获取详细信息的哈希输出。
my $list = XMLin('F:\Sample.xml', KeepRoot => 1);
#print $list-->{TS}[0]{name};
print Dumper($list );
write_file 'F:\mydump.log', Dumper($list);
我需要有关可以输出比散列更易于阅读的格式的解析器的建议。
有了这个 XML::Simple 我得到了以下格式
$VAR1 = {
'TR' => {
'failures' => '2',
'TS' => {
'asserts' => '0',
'success' => 'False',
'time' => '22',
'name' => 'App.exe',
'executed' => 'True',
'type' => 'Assembly',
'RS' => {
'TS' => {
'asserts' => '0',
'success' => 'False',
'time' => '2335.164',
'name' => 'MyAPP',
'executed' => 'True',
'type' => 'Namespace',
'RS' => {
'TS' => {
'asserts' => '0',
'success' => 'False',
'time' => '2335.164',
'name' => 'Project',
'executed' => 'True',
'type' => 'Namespace',
'RS' => {
'TS' => {
'asserts' => '0',
'success' => 'False',
'time' => '2335.164',
'name' => 'Web',
'executed' => 'True',
'type' => 'Namespace',
'RS' => {
'TS' => {
'asserts' => '0',
'success' => 'False',
'time' => '2335.163',
'name' => 'Server',
'executed' => 'True',
'type' => 'Namespace',
'RS' => {
'TS' => {
'asserts' => '0',
'success' => 'False',
'time' => '2335.163',
'name' => 'Tests',
'Client' => {
'success' => 'True',
'asserts' => '1',
'time' => '2.620',
'executed' => 'True',
'type' => 'TestFixture',
'RS' => {
'TC' => {
'testName3' => {
'success' => 'True',
'asserts' => '0',
'time' => '0.319',
'executed' => 'True',
'result' => 'Success'
},
'testName4' => {
'success' => 'True',
'asserts' => '0',
'time' => '0.000',
'executed' => 'True',
'result' => 'Success'
}
}
},
'result' => 'Success'
},
'Admin' => {
'success' => 'True',
'asserts' => '2',
'time' => '3.306',
'executed' => 'True',
'type' => 'TestFixture',
'RS' => {
'TC' => {
'testName1' => {
'success' => 'True',
'asserts' => '0',
'time' => '0.352',
'executed' => 'True',
'result' => 'Success'
},
'testName2' => {
'success' => 'True',
'asserts' => '0',
'time' => '0.005',
'executed' => 'True',
'result' => 'Success'
}
}
},
'result' => 'Success'
}
}
},
'result' => 'Success'
}
},
'result' => 'Success'
}
},
'result' => 'Failure'
}
},
'result' => 'Failure'
}
},
'result' => 'Failure'
}
},
'result' => 'Failure'
}
},
'result' => 'Failure'
}
},
'result' => 'Failure'
},
'culture-info' => {
'current-culture' => 'en-US',
'current-uiculture' => 'en-US'
},
'errors' => '1',
'time' => '17:43:59',
'date' => '2015-01-12',
'not-run' => '4',
'name' => 'App.exe',
'ignored' => '4',
'total' => '573',
'skipped' => '0',
'environment' => {
'user-domain' => 'domain',
'nunit-version' => '2.6.3.13283',
'os-version' => 'Microsoft Windows NT 6.2.9200.0',
'cwd' => '',
'user' => 'me',
'platform' => 'Win32NT',
'clr-version' => '4.0.30319.34014',
'machine-name' => ''
},
'inconclusive' => '2',
'invalid' => '0'
}
};
根据评论,如果你只想要 TC 节点,你可以解析 XML 文件并遍历节点,如果节点标记为 TC,extracting/printing 你想要的信息。
或者,您可以在读取文件时使用正则表达式来捕获 TC 节点,然后提取您想要的信息。
使用 XML 解析器得到的是你丢弃的,这是你期望得到的,所以我不确定你到底期望什么。没有嵌套的扁平结构?
不要使用 XML::Simple。这是用词不当。一点都不简单,为了简单XML。
The use of this module in new code is discouraged.
试试 XML::Twig。
您的部分问题很简单 - 您有一个深层嵌套的 XML 结构。 'display' 的方法有限。
但是几乎 every XML 解析器所做的是 - 将您的 XML 转换为 perl 数据结构 - 这通常是一个散列。但它通常也会做的,是让你 print 结构回到 'proper' XML.
因此,对于一个简单的重新格式化任务,XML::Twig 会让您:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
sub handle_tc {
my ( $twig, $tc ) = @_;
foreach my $attr ( keys %{ $tc -> atts() } ) {
print "$attr = ".$tc->att($attr)."\n";
}
print "\n";
}
my $twig_parser = XML::Twig->new(
pretty_print => 'indented',
twig_handlers => { 'TC' => \&handle_tc },
)->parsefile('F:\mydump.log');
print "\n\nWhole XML pretty_print\n\n";
$twig_parser->print;
这将打印 'TS' 元素的每个 'name' 属性。每次解析器遇到 TS
元素时,都会使用该 XML 子集调用处理程序。
为了比较,$twig_parser -> print
会根据'pretty_print'选项重新格式化,输出。 (但考虑到您的来源 XML,可能不会改变太多)。
我想使用 Perl 解析这个 xml。我在这里展示的 XML 只是更大的嵌套 XML 的一部分。我尝试过使用普通的解析器,其中大多数以难以读取和访问子节点的哈希格式提供输出。
我想获取元素并读取所有属性值。
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<TR name="App.exe" total="573" errors="1" failures="2" not-run="4" inconclusive="2" ignored="4" skipped="0" invalid="0" date="2015-01-12" time="17:43:59">
<environment version="2" cversion="44" os-version="Microsoft" platform="Win32NT" cwd="" machine-name="" user="me" user-domain="domain" />
<culture-info current-culture="en-US" current-uiculture="en-US" />
<TS type="Assembly" name="App.exe" executed="True" result="Failure" success="False" time="22" asserts="0">
<RS>
<TS type="Namespace" name="MyAPP" executed="True" result="Failure" success="False" time="2335.164" asserts="0">
<RS>
<TS type="Namespace" name="Project" executed="True" result="Failure" success="False" time="2335.164" asserts="0">
<RS>
<TS type="Namespace" name="Website" executed="True" result="Failure" success="False" time="2335.164" asserts="0">
<RS>
<TS type="Namespace" name="Service" executed="True" result="Failure" success="False" time="2335.163" asserts="0">
<RS>
<TS type="SetUpFixture" name="Tests" executed="True" result="Failure" success="False" time="2335.163" asserts="0">
<RS>
<TS type="Namespace" name="tempt" executed="True" result="Success" success="True" time="8.935" asserts="0">
<RS>
<TS type="ParameterizedFixture" name="TempAPI" executed="True" result="Success" success="True" time="8.935" asserts="0">
<RS>
<TS type="TestFixture" name="Admin" executed="True" result="Success" success="True" time="3.306" asserts="2">
<RS>
<TC name="testName1" executed="True" result="Success" success="True" time="0.352" asserts="0" />
<TC name="testName2" executed="True" result="Success" success="True" time="0.005" asserts="0" />
</RS>
</TS>
<TS type="TestFixture" name="Client" executed="True" result="Success" success="True" time="2.620" asserts="1">
<RS>
<TC name="testName3" executed="True" result="Success" success="True" time="0.319" asserts="0" />
<TC name="testName4" executed="True" result="Success" success="True" time="0.000" asserts="0" />
</RS>
</TS>
<TS type="TestFixture" name="Employee" executed="True" result="Success" success="True" time="3.007" asserts="1">
<RS>
<TC name="testName5" executed="True" result="Success" success="True" time="0.290" asserts="0" />
<TC name="testName6" executed="True" result="Success" success="True" time="0.000" asserts="0" />
</RS>
</TS>
</RS>
</TS>
</RS>
</TS>
</RS>
</TS>
</RS>
</TS>
</RS>
</TS>
</RS>
</TS>
</RS>
</TS>
</RS>
</TS>
</TR>
我试过这样做,正如我所说,这将提供难以读取和获取详细信息的哈希输出。
my $list = XMLin('F:\Sample.xml', KeepRoot => 1);
#print $list-->{TS}[0]{name};
print Dumper($list );
write_file 'F:\mydump.log', Dumper($list);
我需要有关可以输出比散列更易于阅读的格式的解析器的建议。
有了这个 XML::Simple 我得到了以下格式
$VAR1 = {
'TR' => {
'failures' => '2',
'TS' => {
'asserts' => '0',
'success' => 'False',
'time' => '22',
'name' => 'App.exe',
'executed' => 'True',
'type' => 'Assembly',
'RS' => {
'TS' => {
'asserts' => '0',
'success' => 'False',
'time' => '2335.164',
'name' => 'MyAPP',
'executed' => 'True',
'type' => 'Namespace',
'RS' => {
'TS' => {
'asserts' => '0',
'success' => 'False',
'time' => '2335.164',
'name' => 'Project',
'executed' => 'True',
'type' => 'Namespace',
'RS' => {
'TS' => {
'asserts' => '0',
'success' => 'False',
'time' => '2335.164',
'name' => 'Web',
'executed' => 'True',
'type' => 'Namespace',
'RS' => {
'TS' => {
'asserts' => '0',
'success' => 'False',
'time' => '2335.163',
'name' => 'Server',
'executed' => 'True',
'type' => 'Namespace',
'RS' => {
'TS' => {
'asserts' => '0',
'success' => 'False',
'time' => '2335.163',
'name' => 'Tests',
'Client' => {
'success' => 'True',
'asserts' => '1',
'time' => '2.620',
'executed' => 'True',
'type' => 'TestFixture',
'RS' => {
'TC' => {
'testName3' => {
'success' => 'True',
'asserts' => '0',
'time' => '0.319',
'executed' => 'True',
'result' => 'Success'
},
'testName4' => {
'success' => 'True',
'asserts' => '0',
'time' => '0.000',
'executed' => 'True',
'result' => 'Success'
}
}
},
'result' => 'Success'
},
'Admin' => {
'success' => 'True',
'asserts' => '2',
'time' => '3.306',
'executed' => 'True',
'type' => 'TestFixture',
'RS' => {
'TC' => {
'testName1' => {
'success' => 'True',
'asserts' => '0',
'time' => '0.352',
'executed' => 'True',
'result' => 'Success'
},
'testName2' => {
'success' => 'True',
'asserts' => '0',
'time' => '0.005',
'executed' => 'True',
'result' => 'Success'
}
}
},
'result' => 'Success'
}
}
},
'result' => 'Success'
}
},
'result' => 'Success'
}
},
'result' => 'Failure'
}
},
'result' => 'Failure'
}
},
'result' => 'Failure'
}
},
'result' => 'Failure'
}
},
'result' => 'Failure'
}
},
'result' => 'Failure'
},
'culture-info' => {
'current-culture' => 'en-US',
'current-uiculture' => 'en-US'
},
'errors' => '1',
'time' => '17:43:59',
'date' => '2015-01-12',
'not-run' => '4',
'name' => 'App.exe',
'ignored' => '4',
'total' => '573',
'skipped' => '0',
'environment' => {
'user-domain' => 'domain',
'nunit-version' => '2.6.3.13283',
'os-version' => 'Microsoft Windows NT 6.2.9200.0',
'cwd' => '',
'user' => 'me',
'platform' => 'Win32NT',
'clr-version' => '4.0.30319.34014',
'machine-name' => ''
},
'inconclusive' => '2',
'invalid' => '0'
}
};
根据评论,如果你只想要 TC 节点,你可以解析 XML 文件并遍历节点,如果节点标记为 TC,extracting/printing 你想要的信息。
或者,您可以在读取文件时使用正则表达式来捕获 TC 节点,然后提取您想要的信息。
使用 XML 解析器得到的是你丢弃的,这是你期望得到的,所以我不确定你到底期望什么。没有嵌套的扁平结构?
不要使用 XML::Simple。这是用词不当。一点都不简单,为了简单XML。
The use of this module in new code is discouraged.
试试 XML::Twig。
您的部分问题很简单 - 您有一个深层嵌套的 XML 结构。 'display' 的方法有限。
但是几乎 every XML 解析器所做的是 - 将您的 XML 转换为 perl 数据结构 - 这通常是一个散列。但它通常也会做的,是让你 print 结构回到 'proper' XML.
因此,对于一个简单的重新格式化任务,XML::Twig 会让您:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
sub handle_tc {
my ( $twig, $tc ) = @_;
foreach my $attr ( keys %{ $tc -> atts() } ) {
print "$attr = ".$tc->att($attr)."\n";
}
print "\n";
}
my $twig_parser = XML::Twig->new(
pretty_print => 'indented',
twig_handlers => { 'TC' => \&handle_tc },
)->parsefile('F:\mydump.log');
print "\n\nWhole XML pretty_print\n\n";
$twig_parser->print;
这将打印 'TS' 元素的每个 'name' 属性。每次解析器遇到 TS
元素时,都会使用该 XML 子集调用处理程序。
为了比较,$twig_parser -> print
会根据'pretty_print'选项重新格式化,输出。 (但考虑到您的来源 XML,可能不会改变太多)。