YAML 与自定义方法合并

Question

我正在尝试合并两个 yml：-

use Hash::Merge qw( merge );
use YAML;
$file1 = a.yml
$file2 = b.yml
my $load1=&YAML::LoadFile($file1);
my $load2=&YAML::LoadFile($file2);
my $merge_data = merge($load1, $load2);
my $out_yml = Dump $final_soc_cfg_request;

我在自定义合并机制中遗漏了什么吗？

Answer 1

以下是如何手动合并哈希的示例：

use feature qw(say);
use strict;
use warnings;
use experimental qw(signatures);
use List::Util qw(uniq);
use YAML;
{
    my $file1 = 'a.yml';
    my $file2 = 'b.yml';
    my $load1= YAML::LoadFile($file1);
    my $load2= YAML::LoadFile($file2);
    my $merge_data = merge($load1, $load2);
}

sub merge($hash1, $hash2) {
    my @hashes = ($hash1, $hash2);
    my %save;
    for my $hash (@hashes) {
        my $array = get_hash_value($hash,"tool_pointer", "ARRAY");
        for my $sub_hash (@$array) {
            my $name = get_hash_value($sub_hash, "name");
            my $project = get_hash_value($sub_hash, "project");
            my $source = get_hash_value($sub_hash, "source");
            my $key = join $;, $name, $project, $source;
            my $tools = get_hash_value($sub_hash, "tools", "ARRAY");
            my @tool_names;
            for my $item (@$tools) {
                my $name = get_hash_value($item, "name");
                push @tool_names, $name;
            }
            push @{ $save{$key} }, @tool_names;
        }
    }
    my @result;
    for my $key (keys %save) {
        my ($name, $project, $source) = split $;, $key;
        my $names = $save{$key};
        my @tools = map { +{ name => $_} } uniq @$names;
        push @result, {name => $name,
                       project => $project,
                       source => $source,
                       tools => \@tools };
    }
    return {tool_pointer => \@result};
}

sub get_hash_value($hash, $key, $ref="") {
    die "Not a hash ref\n" if ref $hash ne "HASH";
    die "Hash key '$key' does not exist\n" if !exists $hash->{$key};
    my $value = $hash->{$key};
    die "Unexpected hash value\n" if ref $value ne $ref;
    return $value;
}

Answer 2

对于问题中显示的数据，这是使用 Hash::Merge::add_behavior_spec 的一种方法。

use warnings;
use strict;
use feature 'say';    
use Data::Dump qw(dd pp);
    
use Storable qw(dclone);
use YAML qw(LoadFile);
use Hash::Merge qw(merge);

die "Usage: [=10=] file1 file2\n" if @ARGV != 2;
my ($fname1, $fname2) = @ARGV;

my $yml1 = LoadFile($fname1);
my $yml2 = LoadFile($fname2);

# Naive merge doesn't do all that's wanted
#my $r = merge($yml1, $yml2); dd $r;

Hash::Merge::add_behavior_spec( {
    'SCALAR' => {
        'SCALAR' => sub { $_[0] },
        'ARRAY'  => sub { $_[0] },
        'HASH'   => sub { $_[0] },
    },
    'ARRAY' => {
        'SCALAR' => sub { [ @{ $_[0] }, $_[1] ] },
        'ARRAY'  => sub {
            # For each pair of hashrefs check whether three (of four) keys
            # have the same values; if so, merge them. If not then add them.
            my @res;
            # Scan: Which elements in each array to merge, and which with which
            my (%m1, %m2, %merge);
            my @a1 = @{$_[0]};
            my @a2 = @{$_[1]};
            I1: foreach my $i1 (0..$#a1) {
                I2: foreach my $i2 (0..$#a2) {
                    next if exists $m1{$i1} or exists $m2{$i2};
                    if (ref $a1[$i1] eq 'HASH' and ref $a2[$i2] eq 'HASH') {
                        for (qw(name project source)) {
                            next I2 if $a1[$i1]->{$_} ne $a2[$i2]->{$_};
                        }
                    }
                    # Three key-pairs are same so fourth ones need be merged
                    $m1{$i1} = $m2{$i2} = 1;
                    $merge{$i1} = $i2;
                }
            }

            # Now assemble/merge components as marked above
            my (%added_1, %added_2);       # more bookkeeping needed :(
            A1: foreach my $i1 (0..$#a1) {
                A2: foreach my $i2 (0..$#a2) {
                    next A2 if (exists $m1{$i1} and exists $m2{$i2})
                            or exists $added_2{$i2};
                    push @res, dclone $a2[$i2];
                    $added_2{$i2} = 1;
                }
                next A1 if exists $m1{$i1} or exists $added_1{$i1};   
                push @res, dclone $a1[$i1];
                $added_1{$i1} = 1;
            }

            foreach my $i (keys %merge) {
                push @res, 
                    Hash::Merge::_merge_hashes($a1[$i], $a2[$merge{$i}]);
            }
            \@res;
        },
        'HASH'  => sub { [ @{ $_[0] }, values %{ $_[1] } ] },
    },
    'HASH' => {
        'SCALAR' => sub { $_[0] },
        'ARRAY'  => sub { $_[0] },
        'HASH'   => sub { Hash::Merge::_merge_hashes( $_[0], $_[1] ) },
    },
}, 'Merge Arrayrefs Recursively As Well',);


my $res = merge($yml1, $yml2);

dd $res;

每个要合并的数据结构都是一个包含哈希引用的数组引用。当它们被模块合并时，hashrefs 只是集中到一个 arrayref 中，而可以想象该 arrayref 中的一些 hashrefs 可以进一步合并。^†

这在上面的 ARRAY-to-ARRAY 规则中得到了补救，其中检查哈希引用对是否应该合并，如果是则标记。合并的标准如下

每个 hahref 都有三个带有字符串值的键，另一个带有 arrayref 值。如果这三个键值对相同，那么 hashrefs 应该与三个键值对合二为一，而它们的第四个键的 arrayrefs 应该合并为一个，即该键的值。

然后在第二遍中，那些被标记为合并的将受到 _merge_hashes 例程的约束，而其他则被添加到结果数组中。

这会打印

{
  tool_pointer => [
    {
      name    => "tool_gen.config",
      project => "TOT",
      source  => "etc",
      tools   => [{ name => "vipcat" }, { name => "log" }],
    },
    {
      name    => "tool_log.config",
      project => "TOT",
      source  => "etc",
      tools   => [
                   { name => "xc" },
                   { name => "test" },
                   { name => "vr" },
                   { name => "arbgen2" },
                 ],
    },
  ],
}

我用的是Data::Dump to display complex data structures, for its default simplicity and conciseness. (My choice; there are others of course, with Data::Dumper在内核中，所以已经安装了。）

这只是一个演示，因为我已经简化了周围的事情，还使用固定的硬编码键名称列表，使其适用于显示的数据集。请填写您真实数据的详细信息。

请注意，这必须为每个单独的数据集重新编码，因此必须对输入数据的任何更改进行审查，并可能进行调整或显着更改。

另见和 this post 其他几个例子。

^† 不过不怪模块。无法分析数组元素是否或如何合并它们，因为没有通用的标准。因此，然后将它们简单地添加到结果数组中，对于任何更具体的需求，有 add_behavior_spec.

YAML 与自定义方法合并

YAML merging with custom methods

perl

data-structures