查找两个 perl 嵌套哈希之间的差异

Question

我试图找出两个文件的区别，其中包含 key/values 个条目，并且 return 所有 key/values 都被添加或删除。目前，我正在使用 linux diff 来查找差异，但很自然，如果更改值顺序，那么它将是一个有效的差异，但我不想列出它们，因为对我来说无效。

文件 1:

key1    kamal1.google.com kamal2.google.com kamal3.google.com 
key2    kamal4.google.com

文件 2:

key1    kamal1.google.com kamal6.google.com kamal3.google.com 
key3    kamal4.google.com

我需要的：

显示 deleted key2 with values kamal4.google.com、added key3 with kamal4.google.com、deleted kamal2.google.com from key1、added kamal6.google.com to key1
留言是有代表性的，我们可以修改成更有意义的

我的方法是什么：

读取文件并放入不同的哈希值key1 => {kamal1.google.com => 1, ...}, key2 => {kamal4.google.com => 1}。我也将数组作为散列，以便我们有效地进行比较。
遍历两个散列的键并查找它是否存在于哪个散列中。
进行递归调用以查找值中的差异（因为它又是一个散列）

我的代码有问题：
- 不适用于嵌套
- 丢失 parent.

的踪迹

代码：

my $file1 = 'file1';
my $file2 = 'file2';

my $old = hashifyFile($file1);
my $new = hashifyFile($file2);
my $result = {};
compareHashes($old , $new, $result);
print Dumper $result;

    sub compareHashes {
        my ($hash1, $hash2, $result) = @_;

            for my $key (keys %$hash1, keys %$hash2) {
                if (not exists $hash2->{$key}) {
                        push @{$result->{deleted}->{$key}}, keys %{$hash1->{$key}};
                } elsif (not exists $hash1->{$key}) {
                        push @{$result->{added}->{$key}}, keys %{$hash2->{$key}};
                } elsif (ref $hash1->{$key} eq 'HASH' or ref $hash2->{$key} eq 'HASH' ) {
                    compareHashes($hash1->{$key}, $hash2->{$key}, $result);
                }
            }
    }

# helper functions
sub trim {
   my $val = shift;
   $val =~ s/^\s*|\s*$//g;
   return $val;
}


sub hashifyFile {
    my $file = shift;
    my $contents = {};
    open my $file_fh, '<', $file or die "couldn't open $file $!";

    my ($key, @val);
    while (my $line = <$file_fh>) {
        # skip blank lines and comments
        next if $line =~ /^\s*$/;
        next if $line =~ /^#/;
        # print "$. $line";

        # if line starts with a word, means its "key values"
        # if it starts with multiple spaces assuming minimum 4, seems values for the previous key
        if ($line =~ /^\w/) {
            ($key, @val) = split /\s+|=/, $line;
        } elsif ($line =~ /^\s{4,}\w/) {
            push @val, split /\s+/, $line;
        }
        my %temp_hash;
        for (@val) {
                # next unless $_;
                $temp_hash{trim($_)} = 1 if trim($_);
        }
        $key = trim($key);
        $contents->{$key} = \%temp_hash if defined $key;

    }

    close $file_fh;
    return $contents;
}

Answer 1

根据您的描述，这里有一个示例，说明如何操作。请说明这是否是您想要的。

sub compareHashes {
    my ($hash1, $hash2, $result, $parent) = @_;

    my %all_keys = map {$_ => 1} keys %$hash1, keys %$hash2;

    for my $key (keys %all_keys) {
        if (not exists $hash2->{$key}) {
            if ( defined $parent ) {
                push @{$result->{deleted}->{$parent}}, $key;
            }
            else {
                push @{$result->{deleted}->{$key}}, keys %{$hash1->{$key}};
            }
        } elsif (not exists $hash1->{$key}) {
            if ( defined $parent ) {
                push @{$result->{added}->{$parent}}, $key;
            }
            else {
                push @{$result->{added}->{$key}}, keys %{$hash2->{$key}};
            }
        }
        else {
            if ((ref $hash1->{$key} eq 'HASH') and (ref $hash2->{$key} eq 'HASH') ) {
                compareHashes($hash1->{$key}, $hash2->{$key}, $result, $key);
            }
        }
    }
}

输出:

$VAR1 = {
          'added' => {
                       'key3' => [
                                   'kamal4.google.com'
                                 ],
                       'key1' => [
                                   'kamal6.google.com'
                                 ]
                     },
          'deleted' => {
                         'key2' => [
                                     'kamal4.google.com'
                                   ],
                         'key1' => [
                                     'kamal2.google.com'
                                   ]
                       }
        };

Answer 2

CPAN 上有几个模块比较深层嵌套的数据结构。它们的主要区别在于对差异进行编码的方式。这是一个精选列表：

查找两个 perl 嵌套哈希之间的差异

Find difference between two perl nested hashes

perl

hash

set-difference

data-structures