在这种情况下,使用两个数组哈希的最佳方法是什么?
What would be best approach to work with two hashes of arrays in this scenario?
处理这两个数组散列的最佳方法是什么?第一个数据集包含 xml 数据,第二个数据集来自 csv 文件,其想法是检查第二个数据集的文件名是否在第一个数据集中,如果是,则计算文件传输的延迟。我不确定如何最好地生成我可以使用的可行散列(或更改现有散列以将文件名作为它们的键,或者可能以某种方式将它们合并在一起),任何反馈将不胜感激
数据集 1(xml 数据):
$VAR1 = [
{
'StartTimestamp' => 1478146371,
'EndTimestamp' => 1478149167,
'FileName' => 'a3_file_20161024.req',
'Stage' => 'SentUserResponse'
},
{
'StartTimestamp' => 1478146375,
'EndTimestamp' => 1478149907,
'FileName' => 'a2_file_20161024.req',
'Stage' => 'SentUserResponse'
},
{
'StartTimestamp' => 1478161030,
'EndTimestamp' => 1478161234,
'FileName' => 'file_DEX_0.req',
'Stage' => 'SentUserResponse'
},
来自 csv 文件的数据集 2:
$VAR1 = [
{
'FileName' => 'a3_file_20161024.req',
'ExpectedTime' => '20:04:07'
},
{
'FileName' => 'a2_file_20161024.req',
'ExpectedTime' => '20:14:39'
},
{
'FileName' => 'file_DEX_0.req',
'ExpectedTime' => '20:48:40'
},
使用的代码:
sub Demo {
my $api_ref = GetData($apicall);
my $csvdata = ReadDataFile();
print Dumper($api_ref);
print "-------------------------*********--------------************------------------\n";
print Dumper ($csvdata);
print "#####################\n";
}
sub ReadDataFile {
my $parser = Text::CSV::Simple->new;
$parser->field_map(qw/FileName ExpectedTime/);
my @csv_data = $parser->read_file($datafile);
return \@csv_data;
}
sub GetData {
my ($xml) = @_;
my @api_data;
my %request;
my $t = XML::Twig->new(
twig_handlers => {
'//UserRequest' => sub {
push @api_data, {%request} if %request;
%request = ();
$_->purge; # free memory
},
'//UserRequest/HomeFileName' => sub {
$request{FileName} = $_->trimmed_text;
},
'//UserRequest/Stage' => sub {
$request{Stage} = $_->trimmed_text;
},
'//UserRequest/StartTimestamp' => sub {
$request{StartTimestamp} = str2time(substr($_->trimmed_text, -8));
},
'//UserRequest/EndTimestamp' => sub {
$request{EndTimestamp} = str2time(substr($_->trimmed_text, -8));
},
},
);
$t->xparse($xml);
$t->purge;
return \@api_data;
}
我假设,您可以通过比较文件名将第一个数组的元素映射到第二个数组的元素,并且该关系是 1:1 关系,我将执行以下步骤:
- 按文件名对列表排序或生成索引哈希
- 将两个集合组合成一个散列数组或使用索引来处理您的数据集
- 对数据集做任何你想做的事
举个小例子:
#!/usr/bin/env perl
use strict;
use warnings;
my $api_ref = [
{
'StartTimestamp' => 1478146371,
'EndTimestamp' => 1478149167,
'FileName' => 'a3_file_20161024.req',
'Stage' => 'SentUserResponse'
},
{
'StartTimestamp' => 1478146375,
'EndTimestamp' => 1478149907,
'FileName' => 'a2_file_20161024.req',
'Stage' => 'SentUserResponse'
},
{
'StartTimestamp' => 1478161030,
'EndTimestamp' => 1478161234,
'FileName' => 'file_DEX_0.req',
'Stage' => 'SentUserResponse'
}
];
my $csvdata = [
{
'FileName' => 'a3_file_20161024.req',
'ExpectedTime' => '20:04:07'
},
{
'FileName' => 'a2_file_20161024.req',
'ExpectedTime' => '20:14:39'
},
{
'FileName' => 'file_DEX_0.req',
'ExpectedTime' => '20:48:40'
}
];
# generate the index
my %index = ();
for ( my $i = 0 ; $i < @{$api_ref} ; $i++ ) {
$index{ $api_ref->[$i]{FileName} }{api_idx} = $i;
}
for ( my $i = 0 ; $i < @{$csvdata} ; $i++ ) {
$index{ $csvdata->[$i]{FileName} }{csv_idx} = $i;
}
# filter for elements not present in both data sets
my @filename_intersection =
grep { exists $index{$_}{api_idx} && exists $index{$_}{csv_idx} }
( keys %index );
foreach my $filename (@filename_intersection) {
# do something with
my $api_entry = $api_ref->[ $index{$filename}{api_idx} ];
my $csv_entry = $csvdata->[ $index{$filename}{csv_idx} ];
# example convert ExpectedTime into seconds and compare it to Start/End time difference
$csv_entry->{ExpectedTime} =~ /^(\d{2}):(\d{2}):(\d{2})$/;
my $exp_sec = ( * 60 + ) * 60 + ;
my $real_sec = $api_entry->{EndTimestamp} - $api_entry->{StartTimestamp};
my $msg = "";
if ( $exp_sec >= $real_sec ) {
$msg = "in time:";
}
else {
$msg = "late:";
}
printf
"Filename %s was %s; expected time: %d seconds, real time: %d seconds\n",
$filename, $msg, $exp_sec, $real_sec;
}
最好,
弗兰克
处理这两个数组散列的最佳方法是什么?第一个数据集包含 xml 数据,第二个数据集来自 csv 文件,其想法是检查第二个数据集的文件名是否在第一个数据集中,如果是,则计算文件传输的延迟。我不确定如何最好地生成我可以使用的可行散列(或更改现有散列以将文件名作为它们的键,或者可能以某种方式将它们合并在一起),任何反馈将不胜感激
数据集 1(xml 数据):
$VAR1 = [
{
'StartTimestamp' => 1478146371,
'EndTimestamp' => 1478149167,
'FileName' => 'a3_file_20161024.req',
'Stage' => 'SentUserResponse'
},
{
'StartTimestamp' => 1478146375,
'EndTimestamp' => 1478149907,
'FileName' => 'a2_file_20161024.req',
'Stage' => 'SentUserResponse'
},
{
'StartTimestamp' => 1478161030,
'EndTimestamp' => 1478161234,
'FileName' => 'file_DEX_0.req',
'Stage' => 'SentUserResponse'
},
来自 csv 文件的数据集 2:
$VAR1 = [
{
'FileName' => 'a3_file_20161024.req',
'ExpectedTime' => '20:04:07'
},
{
'FileName' => 'a2_file_20161024.req',
'ExpectedTime' => '20:14:39'
},
{
'FileName' => 'file_DEX_0.req',
'ExpectedTime' => '20:48:40'
},
使用的代码:
sub Demo {
my $api_ref = GetData($apicall);
my $csvdata = ReadDataFile();
print Dumper($api_ref);
print "-------------------------*********--------------************------------------\n";
print Dumper ($csvdata);
print "#####################\n";
}
sub ReadDataFile {
my $parser = Text::CSV::Simple->new;
$parser->field_map(qw/FileName ExpectedTime/);
my @csv_data = $parser->read_file($datafile);
return \@csv_data;
}
sub GetData {
my ($xml) = @_;
my @api_data;
my %request;
my $t = XML::Twig->new(
twig_handlers => {
'//UserRequest' => sub {
push @api_data, {%request} if %request;
%request = ();
$_->purge; # free memory
},
'//UserRequest/HomeFileName' => sub {
$request{FileName} = $_->trimmed_text;
},
'//UserRequest/Stage' => sub {
$request{Stage} = $_->trimmed_text;
},
'//UserRequest/StartTimestamp' => sub {
$request{StartTimestamp} = str2time(substr($_->trimmed_text, -8));
},
'//UserRequest/EndTimestamp' => sub {
$request{EndTimestamp} = str2time(substr($_->trimmed_text, -8));
},
},
);
$t->xparse($xml);
$t->purge;
return \@api_data;
}
我假设,您可以通过比较文件名将第一个数组的元素映射到第二个数组的元素,并且该关系是 1:1 关系,我将执行以下步骤:
- 按文件名对列表排序或生成索引哈希
- 将两个集合组合成一个散列数组或使用索引来处理您的数据集
- 对数据集做任何你想做的事
举个小例子:
#!/usr/bin/env perl
use strict;
use warnings;
my $api_ref = [
{
'StartTimestamp' => 1478146371,
'EndTimestamp' => 1478149167,
'FileName' => 'a3_file_20161024.req',
'Stage' => 'SentUserResponse'
},
{
'StartTimestamp' => 1478146375,
'EndTimestamp' => 1478149907,
'FileName' => 'a2_file_20161024.req',
'Stage' => 'SentUserResponse'
},
{
'StartTimestamp' => 1478161030,
'EndTimestamp' => 1478161234,
'FileName' => 'file_DEX_0.req',
'Stage' => 'SentUserResponse'
}
];
my $csvdata = [
{
'FileName' => 'a3_file_20161024.req',
'ExpectedTime' => '20:04:07'
},
{
'FileName' => 'a2_file_20161024.req',
'ExpectedTime' => '20:14:39'
},
{
'FileName' => 'file_DEX_0.req',
'ExpectedTime' => '20:48:40'
}
];
# generate the index
my %index = ();
for ( my $i = 0 ; $i < @{$api_ref} ; $i++ ) {
$index{ $api_ref->[$i]{FileName} }{api_idx} = $i;
}
for ( my $i = 0 ; $i < @{$csvdata} ; $i++ ) {
$index{ $csvdata->[$i]{FileName} }{csv_idx} = $i;
}
# filter for elements not present in both data sets
my @filename_intersection =
grep { exists $index{$_}{api_idx} && exists $index{$_}{csv_idx} }
( keys %index );
foreach my $filename (@filename_intersection) {
# do something with
my $api_entry = $api_ref->[ $index{$filename}{api_idx} ];
my $csv_entry = $csvdata->[ $index{$filename}{csv_idx} ];
# example convert ExpectedTime into seconds and compare it to Start/End time difference
$csv_entry->{ExpectedTime} =~ /^(\d{2}):(\d{2}):(\d{2})$/;
my $exp_sec = ( * 60 + ) * 60 + ;
my $real_sec = $api_entry->{EndTimestamp} - $api_entry->{StartTimestamp};
my $msg = "";
if ( $exp_sec >= $real_sec ) {
$msg = "in time:";
}
else {
$msg = "late:";
}
printf
"Filename %s was %s; expected time: %d seconds, real time: %d seconds\n",
$filename, $msg, $exp_sec, $real_sec;
}
最好, 弗兰克