如何将散列中可变数量的数组传递给 List::Compare perl 模块?

How do I pass a variable number of arrays in a hash to the List::Compare perl module?

我知道这个问题以前在这里被问过 (compare multiple hashes for common keys merge values)。据我所知,它没有得到答复。如果您回答,请包含一个使用 List::Compare->new() 构造函数的示例。

List::Compare 能够接受多个数组作为输入。但是,如果您事先不知道有多少将传递给构造函数,则没有示例说明如何执行此操作。

手册页中的示例:

$lcm = List::Compare->new(\@Al, \@Bob, \@Carmen, \@Don, \@Ed);

或...

您可以使用 'single hashref' 构造器格式构建一个 List::Compare 一次处理三个或更多列表的对象:

$lcm = List::Compare->new( { lists => [\@Al, \@Bob, \@Carmen, \@Don, @Ed], } );

$lcm = List::Compare->new( { 
        lists => [\@Al, \@Bob, \@Carmen, \@Don, \@Ed],
        unsorted => 1, } );

我需要用到上面这个'single hashref'构造函数,因为我不知道有多少列表(数组)会被传递给构造函数。我最接近的是:

my %l;
my @a = ("fred", "barney", "pebbles", "bambam", "dino");
my @b = ("george", "jane", "elroy", "judy");
my @c = ("homer", "bart", "marge", "maggie");
my @d = ("fred", "barney", "pebbles", "bambam", "dino");
my @e = ("fred", "george", "jane", "elroy", "judy", "pebbles");

$l{'lists'}{'a'} = [ @a ];
$l{'lists'}{'b'} = [ @b ];
$l{'lists'}{'c'} = [ @c ];
$l{'lists'}{'d'} = [ @d ];
$l{'lists'}{'e'} = [ @e ];

my $lc = List::Compare->new(\%l);
my @intersection = $lc->get_intersection;
print @intersection . "\n";

我得到:

需要正确定义 'lists' 键:在 /usr/local/share/perl5/List/Compare.pm 第 21 行。

Compare.pm 代码(第 21 行)是:

die "Need to define 'lists' key properly: $!"
        unless ( ${$argref}{'lists'}
             and (ref(${$argref}{'lists'}) eq 'ARRAY') );

谁能告诉我如何从简单的数组构造和命名这个散列?我需要能够连续处理大量不同的数据。可能涉及数百个数组。


更新

@Borodin 的回答正是我所需要的。我为错误的数据道歉,试图想出一些简洁的东西。这是从该代码派生出来的

my @sets_to_process = qw( DOW SP500 has_trend_lines_day );
my @sets;
my $num_sets = $#sets_to_process;

for my $i (0 .. $num_sets) {
    my @set = get_ids_in_list( $dbh, $sets_to_process[$i] );
    push @sets, \@set;
}

my $lc = List::Compare->new(@sets);
my @intersection = $lc->get_intersection;

print "Sets:\n";
printf "  %s\n", join ', ', @$_ for @sets;
print "\n";

print "Intersection:\n";
printf "  %s\n", @intersection? join(', ', @intersection) : 'None';
print "\n";

您的参数构造的问题是您将 %l(顺便说一句,这是一个可怕的标识符)定义为包含数组散列的散列,就像这样

(
  lists => {
    a => ["fred", "barney", "pebbles", "bambam", "dino"],
    b => ["george", "jane", "elroy", "judy"],
    c => ["homer", "bart", "marge", "maggie"],
    d => ["fred", "barney", "pebbles", "bambam", "dino"],
    e => ["fred", "george", "jane", "elroy", "judy", "pebbles"],
  },
)

但是文档很清楚,它应该是一个包含数组数组的简单散列

(
  lists => [
    ["fred", "barney", "pebbles", "bambam", "dino"],
    ["george", "jane", "elroy", "judy"],
    ["homer", "bart", "marge", "maggie"],
    ["fred", "barney", "pebbles", "bambam", "dino"],
    ["fred", "george", "jane", "elroy", "judy", "pebbles"],
  ],
)

此外,“[你]不知道有多少列表(数组)将被传递给构造函数”对你的问题没有帮助,因为你正在做的是将问题推入数据结构而不是将其保留在参数级别

很难用您提供的数据帮助您,因为交集是空集,所以这里有一个示例程序,您可以试验它生成五到十组 16 个随机字母。每次它是 运行 时,它都会创建不同的数据来处理,并将数组引用列表作为参数直接传递给 new 构造函数,而不是使用对具有单个 [=21= 的散列的引用]元素

use strict;
use warnings;

use List::Util 'shuffle';
use List::Compare;

my @sets;

my $num_sets = 5 + rand(6);

for (1 .. $num_sets) {
  my @set = (shuffle 'A' .. 'Z')[0..16];
  push @sets, [ sort @set ];
}

my $lc = List::Compare->new(@sets);
my @overlap = $lc->get_intersection;

print "Sets:\n";
printf "  %s\n", join ', ', @$_ for @sets;
print "\n";

print "Intersection:\n";
printf "  %s\n", @overlap ? join(', ', @overlap) : 'None';
print "\n";

样本输出

Sets:
  B, C, D, E, F, G, K, L, M, O, P, Q, S, U, V, W, X
  B, C, D, F, G, I, J, L, M, P, R, T, U, V, W, X, Y
  A, B, C, D, F, G, H, K, L, M, O, R, T, U, V, W, Y
  A, B, D, G, H, I, K, L, M, O, R, T, U, V, W, Y, Z
  A, B, C, D, E, F, H, J, K, L, M, P, Q, S, U, V, Z

Intersection:
  B, D, L, M, U, V
Sets:
  A, B, C, D, F, J, K, L, M, N, Q, R, U, V, W, X, Y
  A, E, F, G, H, I, J, L, O, P, Q, R, S, T, V, X, Z
  B, E, G, H, J, K, L, M, N, P, S, T, U, V, W, Y, Z
  B, C, D, E, F, G, H, I, J, N, O, Q, R, T, V, W, Z
  A, B, C, E, F, G, H, I, L, N, O, Q, T, U, W, X, Y

Intersection:
  None

更新

关于您问题中的更新代码,您的标识符 $num_sets 被错误命名,因为它是 [=23= 中最后一个元素的 index ],或比实际套数少1

如果你想使用变量那么你应该说

my $num_sets = @sets_to_process;

然后像这样循环

for my $i ( 0 .. $num_sets-1 ) { ... }

但在这种情况下你根本不需要索引,最好忘记 $num_sets 并只写这个

for my $set ( @sets_to_process ) {
    my @set = get_ids_in_list($dbh, $set);
    push @sets, \@set;
}

甚至像这样使用 map

my @sets = map [ get_ids_in_list($dbh, $_) ], @sets_to_process;