XML::Twig - 识别不包含元素的 blob
XML::Twig - identifying blobs that do not contain an element
我正在使用 XML::Twig
解析 Azure list-blob
REST API.
特别是,我正在寻找识别和删除未提交的孤立 blob,但我不确定如何最好地使用 XML::Twig
有效地完成这项工作。我什至不知道从哪里开始。
最终我需要检索孤立 blob 的 <Name>
元素。
Uncommitted Blobs in the Response
Uncommitted blobs are listed in the response only if the include=uncommittedblobs parameter was specified on the URI. Uncommitted blobs listed in the response do not include any of the following elements:
Last-Modified Etag Content-Type Content-Encoding Content-Language Content-MD5 Cache-Control Metadata
因此,在下面的简化示例中,您可以看到一个名为 "test" 的孤立 blob,因为 <Blob></Blob>
块不包含上述任何元素。
<?xml version="1.0" encoding="utf-8"?>
<EnumerationResults ServiceEndpoint="https://my**account.blob.core.windows.net/"
ContainerName="testonly">
<Blobs>
<Blob>
<Name>test</Name>
<Properties>
<Content-Length>0</Content-Length>
<BlobType>BlockBlob</BlobType>
<LeaseStatus>unlocked</LeaseStatus>
<LeaseState>available</LeaseState>
</Properties>
</Blob>
</Blobs>
<NextMarker/>
</EnumerationResults>
更新:
实际上,我可能过于简单化了。接受的答案似乎不适用于以下内容,它会打印所有内容:
<?xml version="1.0" encoding="utf-8"?>
<EnumerationResults ServiceEndpoint="https://my**account.blob.core.windows.net/" ContainerName="testonly">
<Blobs>
<Blob>
<Name>data/users/docx</Name>
<Properties>
<Last-Modified>Wed, 10 May 2017 20:21:25 GMT</Last-Modified>
<Etag>0x8D497E221E7A5AF</Etag>
<Content-Length>125632</Content-Length>
<Content-Type>application/octet-stream</Content-Type>
<Content-Encoding/>
<Content-Language/>
<Content-MD5/>
<Cache-Control/>
<Content-Disposition/>
<BlobType>BlockBlob</BlobType>
<LeaseStatus>unlocked</LeaseStatus>
<LeaseState>available</LeaseState>
</Properties>
</Blob>
<Blob>
<Name>test</Name>
<Properties>
<Content-Length>0</Content-Length>
<BlobType>BlockBlob</BlobType>
<LeaseStatus>unlocked</LeaseStatus>
<LeaseState>available</LeaseState>
</Properties>
</Blob>
</Blobs>
<NextMarker/>
</EnumerationResults>
我的代码:
sub blob_parse {
my $blob = $_;
$blob->first_child($_) and return
for qw( Last-Modified Etag Content-Type Content-Encoding
Content-Language Content-MD5 Cache-Control Metadata);
say "orph: ".$blob->first_child('Name')->text;
}
sub parseAndDelete {
### ORPHAN
$twig_handlers = {'Blobs/Blob' => \&blob_parse};
$twig = new XML::Twig(twig_handlers=>$twig_handlers);
$twig->parse($message);
}
更新
没有理由使用 XML::Twig
提供的回调系统,除非你的 XML 数据非常庞大,相应的数据结构消耗太多内存,这对于从互联网上获取的数据来说是不可能的留言
我会这样实现
use strict;
use warnings;
use feature 'say';
use XML::Twig;
use List::Util 'none';
my @unwanted = qw/
Last-Modified Etag Content-Type Content-Encoding
Content-Language Content-MD5 Cache-Control Metadata
/;
my $twig = 'XML::Twig'->new;
$twig->parsefile('blob.xml');
for my $blob ( $twig->find_nodes('Blobs/Blob') ) {
if ( none { $blob->find_nodes("Properties/$_") } @unwanted ) {
say $_->text for $blob->find_nodes('Name');
}
}
产出
test
如果您的 XML 实际上格式正确并且您的示例数据是错误的,那么打印所有 Name
元素的文本内容很简单
我用过这个数据
<?xml version="1.0" encoding="utf-8"?>
<EnumerationResults ServiceEndpoint="https://my**account.blob.core.windows.net/"
ContainerName="testonly">
<Blobs>
<Blob>
<Name>test</Name>
<Properties>
<Content-Length>0</Content-Length>
<BlobType>BlockBlob</BlobType>
<LeaseStatus>unlocked</LeaseStatus>
<LeaseState>available</LeaseState>
</Properties>
</Blob>
</Blobs>
<NextMarker/>
</EnumerationResults>
Perl
use strict;
use warnings 'all';
use feature 'say';
use XML::Twig;
my $t = XML::Twig->new;
$t->parsefile( 'blob.xml');
say $_->text for $t->find_nodes('Blobs/Blob/Name');
产出
test
只需为 Blob
创建一个处理程序,如果存在任何元素则不执行任何操作,否则打印名称。使用 first_child
方法检查 blob 的内部结构。
#! /usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
use XML::Twig;
my $xml = '...';
my $twig = 'XML::Twig'->new(twig_handlers => {
Blob => sub {
my $properties = $_->first_child('Properties');
$properties->first_child($_) and return
for qw( Last-Modified Etag Content-Type Content-Encoding
Content-Language Content-MD5 Cache-Control Metadata
);
say $_->first_child('Name')->text;
},
});
$twig->parse($xml);