YAML 与自定义方法合并
YAML merging with custom methods
我正在尝试合并两个 yml:-
use Hash::Merge qw( merge );
use YAML;
$file1 = a.yml
$file2 = b.yml
my $load1=&YAML::LoadFile($file1);
my $load2=&YAML::LoadFile($file2);
my $merge_data = merge($load1, $load2);
my $out_yml = Dump $final_soc_cfg_request;
我在自定义合并机制中遗漏了什么吗?
以下是如何手动合并哈希的示例:
use feature qw(say);
use strict;
use warnings;
use experimental qw(signatures);
use List::Util qw(uniq);
use YAML;
{
my $file1 = 'a.yml';
my $file2 = 'b.yml';
my $load1= YAML::LoadFile($file1);
my $load2= YAML::LoadFile($file2);
my $merge_data = merge($load1, $load2);
}
sub merge($hash1, $hash2) {
my @hashes = ($hash1, $hash2);
my %save;
for my $hash (@hashes) {
my $array = get_hash_value($hash,"tool_pointer", "ARRAY");
for my $sub_hash (@$array) {
my $name = get_hash_value($sub_hash, "name");
my $project = get_hash_value($sub_hash, "project");
my $source = get_hash_value($sub_hash, "source");
my $key = join $;, $name, $project, $source;
my $tools = get_hash_value($sub_hash, "tools", "ARRAY");
my @tool_names;
for my $item (@$tools) {
my $name = get_hash_value($item, "name");
push @tool_names, $name;
}
push @{ $save{$key} }, @tool_names;
}
}
my @result;
for my $key (keys %save) {
my ($name, $project, $source) = split $;, $key;
my $names = $save{$key};
my @tools = map { +{ name => $_} } uniq @$names;
push @result, {name => $name,
project => $project,
source => $source,
tools => \@tools };
}
return {tool_pointer => \@result};
}
sub get_hash_value($hash, $key, $ref="") {
die "Not a hash ref\n" if ref $hash ne "HASH";
die "Hash key '$key' does not exist\n" if !exists $hash->{$key};
my $value = $hash->{$key};
die "Unexpected hash value\n" if ref $value ne $ref;
return $value;
}
对于问题中显示的数据,这是使用 Hash::Merge::add_behavior_spec 的一种方法。
use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd pp);
use Storable qw(dclone);
use YAML qw(LoadFile);
use Hash::Merge qw(merge);
die "Usage: [=10=] file1 file2\n" if @ARGV != 2;
my ($fname1, $fname2) = @ARGV;
my $yml1 = LoadFile($fname1);
my $yml2 = LoadFile($fname2);
# Naive merge doesn't do all that's wanted
#my $r = merge($yml1, $yml2); dd $r;
Hash::Merge::add_behavior_spec( {
'SCALAR' => {
'SCALAR' => sub { $_[0] },
'ARRAY' => sub { $_[0] },
'HASH' => sub { $_[0] },
},
'ARRAY' => {
'SCALAR' => sub { [ @{ $_[0] }, $_[1] ] },
'ARRAY' => sub {
# For each pair of hashrefs check whether three (of four) keys
# have the same values; if so, merge them. If not then add them.
my @res;
# Scan: Which elements in each array to merge, and which with which
my (%m1, %m2, %merge);
my @a1 = @{$_[0]};
my @a2 = @{$_[1]};
I1: foreach my $i1 (0..$#a1) {
I2: foreach my $i2 (0..$#a2) {
next if exists $m1{$i1} or exists $m2{$i2};
if (ref $a1[$i1] eq 'HASH' and ref $a2[$i2] eq 'HASH') {
for (qw(name project source)) {
next I2 if $a1[$i1]->{$_} ne $a2[$i2]->{$_};
}
}
# Three key-pairs are same so fourth ones need be merged
$m1{$i1} = $m2{$i2} = 1;
$merge{$i1} = $i2;
}
}
# Now assemble/merge components as marked above
my (%added_1, %added_2); # more bookkeeping needed :(
A1: foreach my $i1 (0..$#a1) {
A2: foreach my $i2 (0..$#a2) {
next A2 if (exists $m1{$i1} and exists $m2{$i2})
or exists $added_2{$i2};
push @res, dclone $a2[$i2];
$added_2{$i2} = 1;
}
next A1 if exists $m1{$i1} or exists $added_1{$i1};
push @res, dclone $a1[$i1];
$added_1{$i1} = 1;
}
foreach my $i (keys %merge) {
push @res,
Hash::Merge::_merge_hashes($a1[$i], $a2[$merge{$i}]);
}
\@res;
},
'HASH' => sub { [ @{ $_[0] }, values %{ $_[1] } ] },
},
'HASH' => {
'SCALAR' => sub { $_[0] },
'ARRAY' => sub { $_[0] },
'HASH' => sub { Hash::Merge::_merge_hashes( $_[0], $_[1] ) },
},
}, 'Merge Arrayrefs Recursively As Well',);
my $res = merge($yml1, $yml2);
dd $res;
每个要合并的数据结构都是一个包含哈希引用的数组引用。当它们被模块合并时,hashrefs 只是集中到一个 arrayref 中,而可以想象该 arrayref 中的一些 hashrefs 可以进一步合并。†
这在上面的 ARRAY
-to-ARRAY
规则中得到了补救,其中检查哈希引用对是否应该合并,如果是则标记。合并的标准如下
每个 hahref 都有三个带有字符串值的键,另一个带有 arrayref 值。如果这三个键值对相同,那么 hashrefs 应该与三个键值对合二为一,而它们的第四个键的 arrayrefs 应该合并为一个,即该键的值。
然后在第二遍中,那些被标记为合并的将受到 _merge_hashes
例程的约束,而其他则被添加到结果数组中。
这会打印
{
tool_pointer => [
{
name => "tool_gen.config",
project => "TOT",
source => "etc",
tools => [{ name => "vipcat" }, { name => "log" }],
},
{
name => "tool_log.config",
project => "TOT",
source => "etc",
tools => [
{ name => "xc" },
{ name => "test" },
{ name => "vr" },
{ name => "arbgen2" },
],
},
],
}
我用的是Data::Dump to display complex data structures, for its default simplicity and conciseness. (My choice; there are others of course, with Data::Dumper在内核中,所以已经安装了。)
这只是一个演示,因为我已经简化了周围的事情,还使用固定的硬编码键名称列表,使其适用于显示的数据集。请填写您真实数据的详细信息。
请注意,这必须为每个单独的数据集重新编码,因此必须对输入数据的任何更改进行审查,并可能进行调整或显着更改。
另见
和 this post 其他几个例子。
† 不过不怪模块。无法分析数组元素是否或如何合并它们,因为没有通用的标准。因此,然后将它们简单地添加到结果数组中,对于任何更具体的需求,有 add_behavior_spec
.
我正在尝试合并两个 yml:-
use Hash::Merge qw( merge );
use YAML;
$file1 = a.yml
$file2 = b.yml
my $load1=&YAML::LoadFile($file1);
my $load2=&YAML::LoadFile($file2);
my $merge_data = merge($load1, $load2);
my $out_yml = Dump $final_soc_cfg_request;
我在自定义合并机制中遗漏了什么吗?
以下是如何手动合并哈希的示例:
use feature qw(say);
use strict;
use warnings;
use experimental qw(signatures);
use List::Util qw(uniq);
use YAML;
{
my $file1 = 'a.yml';
my $file2 = 'b.yml';
my $load1= YAML::LoadFile($file1);
my $load2= YAML::LoadFile($file2);
my $merge_data = merge($load1, $load2);
}
sub merge($hash1, $hash2) {
my @hashes = ($hash1, $hash2);
my %save;
for my $hash (@hashes) {
my $array = get_hash_value($hash,"tool_pointer", "ARRAY");
for my $sub_hash (@$array) {
my $name = get_hash_value($sub_hash, "name");
my $project = get_hash_value($sub_hash, "project");
my $source = get_hash_value($sub_hash, "source");
my $key = join $;, $name, $project, $source;
my $tools = get_hash_value($sub_hash, "tools", "ARRAY");
my @tool_names;
for my $item (@$tools) {
my $name = get_hash_value($item, "name");
push @tool_names, $name;
}
push @{ $save{$key} }, @tool_names;
}
}
my @result;
for my $key (keys %save) {
my ($name, $project, $source) = split $;, $key;
my $names = $save{$key};
my @tools = map { +{ name => $_} } uniq @$names;
push @result, {name => $name,
project => $project,
source => $source,
tools => \@tools };
}
return {tool_pointer => \@result};
}
sub get_hash_value($hash, $key, $ref="") {
die "Not a hash ref\n" if ref $hash ne "HASH";
die "Hash key '$key' does not exist\n" if !exists $hash->{$key};
my $value = $hash->{$key};
die "Unexpected hash value\n" if ref $value ne $ref;
return $value;
}
对于问题中显示的数据,这是使用 Hash::Merge::add_behavior_spec 的一种方法。
use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd pp);
use Storable qw(dclone);
use YAML qw(LoadFile);
use Hash::Merge qw(merge);
die "Usage: [=10=] file1 file2\n" if @ARGV != 2;
my ($fname1, $fname2) = @ARGV;
my $yml1 = LoadFile($fname1);
my $yml2 = LoadFile($fname2);
# Naive merge doesn't do all that's wanted
#my $r = merge($yml1, $yml2); dd $r;
Hash::Merge::add_behavior_spec( {
'SCALAR' => {
'SCALAR' => sub { $_[0] },
'ARRAY' => sub { $_[0] },
'HASH' => sub { $_[0] },
},
'ARRAY' => {
'SCALAR' => sub { [ @{ $_[0] }, $_[1] ] },
'ARRAY' => sub {
# For each pair of hashrefs check whether three (of four) keys
# have the same values; if so, merge them. If not then add them.
my @res;
# Scan: Which elements in each array to merge, and which with which
my (%m1, %m2, %merge);
my @a1 = @{$_[0]};
my @a2 = @{$_[1]};
I1: foreach my $i1 (0..$#a1) {
I2: foreach my $i2 (0..$#a2) {
next if exists $m1{$i1} or exists $m2{$i2};
if (ref $a1[$i1] eq 'HASH' and ref $a2[$i2] eq 'HASH') {
for (qw(name project source)) {
next I2 if $a1[$i1]->{$_} ne $a2[$i2]->{$_};
}
}
# Three key-pairs are same so fourth ones need be merged
$m1{$i1} = $m2{$i2} = 1;
$merge{$i1} = $i2;
}
}
# Now assemble/merge components as marked above
my (%added_1, %added_2); # more bookkeeping needed :(
A1: foreach my $i1 (0..$#a1) {
A2: foreach my $i2 (0..$#a2) {
next A2 if (exists $m1{$i1} and exists $m2{$i2})
or exists $added_2{$i2};
push @res, dclone $a2[$i2];
$added_2{$i2} = 1;
}
next A1 if exists $m1{$i1} or exists $added_1{$i1};
push @res, dclone $a1[$i1];
$added_1{$i1} = 1;
}
foreach my $i (keys %merge) {
push @res,
Hash::Merge::_merge_hashes($a1[$i], $a2[$merge{$i}]);
}
\@res;
},
'HASH' => sub { [ @{ $_[0] }, values %{ $_[1] } ] },
},
'HASH' => {
'SCALAR' => sub { $_[0] },
'ARRAY' => sub { $_[0] },
'HASH' => sub { Hash::Merge::_merge_hashes( $_[0], $_[1] ) },
},
}, 'Merge Arrayrefs Recursively As Well',);
my $res = merge($yml1, $yml2);
dd $res;
每个要合并的数据结构都是一个包含哈希引用的数组引用。当它们被模块合并时,hashrefs 只是集中到一个 arrayref 中,而可以想象该 arrayref 中的一些 hashrefs 可以进一步合并。†
这在上面的 ARRAY
-to-ARRAY
规则中得到了补救,其中检查哈希引用对是否应该合并,如果是则标记。合并的标准如下
每个 hahref 都有三个带有字符串值的键,另一个带有 arrayref 值。如果这三个键值对相同,那么 hashrefs 应该与三个键值对合二为一,而它们的第四个键的 arrayrefs 应该合并为一个,即该键的值。
然后在第二遍中,那些被标记为合并的将受到 _merge_hashes
例程的约束,而其他则被添加到结果数组中。
这会打印
{ tool_pointer => [ { name => "tool_gen.config", project => "TOT", source => "etc", tools => [{ name => "vipcat" }, { name => "log" }], }, { name => "tool_log.config", project => "TOT", source => "etc", tools => [ { name => "xc" }, { name => "test" }, { name => "vr" }, { name => "arbgen2" }, ], }, ], }
我用的是Data::Dump to display complex data structures, for its default simplicity and conciseness. (My choice; there are others of course, with Data::Dumper在内核中,所以已经安装了。)
这只是一个演示,因为我已经简化了周围的事情,还使用固定的硬编码键名称列表,使其适用于显示的数据集。请填写您真实数据的详细信息。
请注意,这必须为每个单独的数据集重新编码,因此必须对输入数据的任何更改进行审查,并可能进行调整或显着更改。
另见
† 不过不怪模块。无法分析数组元素是否或如何合并它们,因为没有通用的标准。因此,然后将它们简单地添加到结果数组中,对于任何更具体的需求,有 add_behavior_spec
.