如何在循环中访问数组的嵌套哈希?

How to access a nested hash of arrays in a loop?

我有这种格式的数据

a1 1901 4
a1 1902 5
a3 1902 6
a4 1902 7
a4 1903 8
a5 1903 9

我想计算第一列中每个实体的累积分数(第三列)。所以我试着做一个散列,我的代码是这样的:

use strict;
use warnings;

use Data::Dumper;

my $file = shift;
open (DATA, $file);

my %hash;
while ( my $line = <DATA> ) {
  chomp $line;
  my ($protein, $year, $score) = split /\s+/, $line;
  push @{ $hash{$protein}{$year} }, $score;
}

print Dumper \%hash;

close DATA:

输出看起来像这样

$VAR1 = {
          'a3' => {
                    '1902' => [
                                5
                              ]
                  },
          'a1' => {
                    '1902' => [
                                6
                              ],
                    '1901' => [
                                4
                              ]
                  },
          'a4' => {
                    '1903' => [
                                8
                              ],
                    '1902' => [
                                7
                              ]
                  },
          'a5' => {
                    '1903' => [
                                9
                              ]
                  }
        };

我现在想要访问第 1 列 (a1,a2,a3) 中的每个实体并添加分数,因此所需的输出将如下所示:

a1 1901 4
a1 1902 9    # 4+5
a3 1902 6
a4 1902 7
a4 1903 16   # 7+9
a5 1903 9

但我无法想出如何在循环中访问创建的散列值以添加值?

我觉得

a4 1903 16   # Sum of a4 1902 and a5 1903

应该是

a4 1903 15   # Sum of a4 1902 and a4 1903

如果是,

my %scores_by_protein_and_year;
while (<DATA>) {
   my ($protein, $year, $score) = split;
   $scores_by_protein_and_year{$protein}{$year} = $score;
}

for my $protein (keys(%scores_by_protein_and_year)) {
   my $scores_by_year = $scores_by_protein_and_year{$protein};
   my $score = 0;
   for my $year (sort { $a <=> $b } keys(%$scores_by_year)) {
      $score += $scores_by_year->{$year};
      say "$protein $year $score";
   }
}

即使数据不是 grouped/sorted。

如果数据始终按照您显示的方式排序,那么您可以在从文件读取数据时处理数据:

while ( <DATA> ) {
    my ($protein, $year, $score) = split;

    $total = 0 unless $protein eq $current;
    $total += $score;

    print "$protein $year $total\n";

    $current = $protein;
}

输出

a1 1901 4
a1 1902 9
a3 1902 6
a4 1902 7
a4 1903 15
a5 1903 9