通过将一个数组中的记录与 Perl 中另一个数组中的唯一名称列表进行匹配来创建单个文件

Create individual files by matching records in one array with a unique list of names in another array in Perl

我有两个文件:

文件 1 包含唯一的姓名列表,文件 2 包含带有附加数据的姓名列表。

注意:文件 2 可以有多个同名记录。

例如:

文件 1:

ARRON LYNCH
PATRICK MOLONEY
JAMIE MOTT
MICHELLE PAYNE
DANIEL STACKHOUSE
JORDAN CHILDS
LUKE NOLEN
... etc.

文件 2:

ARRON LYNCH,WANGARATTA,RACE 1,BILLIEO (1),MARK STEPHENSON,C,1
PATRICK MOLONEY,WANGARATTA,RACE 1,DALLAS COWGIRL (2),BRENT STANLEY,CC,1
JAMIE MOTT,WANGARATTA,RACE 1,FREE FLYING STAR (3),JOHN MCARDLE,BBB,1
JAMES WINKS,WANGARATTA,RACE 2,AMERICAN WHISKEY (1),MICHAEL, WAYNE & JOHN HAWKES,BBB,2
TEODORE NUGENT,WANGARATTA,RACE 2,MATSUMOTO (2),MITCHELL BEER,CC,2
ALEXANDRA BRYAN,WANGARATTA,RACE 2,O'REG (3),ALLAN FITZGERALD,C,2
LUKE NOLEN,WANGARATTA,RACE 3,ALKAAMEL (1),DAVID & BEN HAYES & TOM DABERNIG,BBB,3
BILLY EGAN,WANGARATTA,RACE 3,CRUNCHIE (4),PATRICK PAYNE,AA,3
CAMPBELL RAWILLER,WANGARATTA,RACE 3,DANCING DUCK (5),RUSSELL OSBORNE,B,3
TEODORE NUGENT,WANGARATTA,RACE 4,DARCY EKCELS (1),RICHARD LAMING,A,4
BRAD RAWILLER,WANGARATTA,RACE 4,LOVE HURTS (3),RICKY MAUND,BBB,4
LUKE NOLEN,WANGARATTA,RACE 4,MESSAGE (4),JOHN MOLONEY,CC,4
JARROD FRY,WANGARATTA,RACE 5,DEFINIA (1),GWENDA JOHNSTONE,B,5
CLAYTON DOUGLAS,WANGARATTA,RACE 5,CHINA AFFAIR (2),JASON WARREN,A,5
DYLAN DUNN,WANGARATTA,RACE 5,AYTON (4),DAVID & BEN HAYES & TOM DABERNIG,BBB,5
TEODORE NUGENT,WANGARATTA,RACE 6,WIND FORCE (3),BEN BRISBOURNE,CCC,6
MADISON LLOYD,WANGARATTA,RACE 6,CARWELKIN (4),MARK THOMAS,CCC,6
ARRON LYNCH,WANGARATTA,RACE 6,DEVIL'S RAIN (5),MARK C. WEBB,B,6
DYLAN DUNN,WANGARATTA,RACE 7,TATUNKA (1),R F DONAT,AAA,7
JACK HILL,WANGARATTA,RACE 7,CAMPOBASSO (2),ROBERT HICKMOTT,AA,7
ARRON LYNCH,WANGARATTA,RACE 7,COONAWARRA (3),MARK C. WEBB,BBB,7
... etc.

注:ARRON LYNCH 在文件 2 中出现了 3 次。

我已经能够成功地将文件 1 和文件 2 加载到数组中以生成文件 3(参见下面的代码)。

文件 3:

ARRON LYNCH,WANGARATTA,RACE 1,BILLIEO (1),MARK STEPHENSON,C,1
ARRON LYNCH,WANGARATTA,RACE 6,DEVIL'S RAIN (5),MARK C. WEBB,B,6
ARRON LYNCH,WANGARATTA,RACE 7,COONAWARRA (3),MARK C. WEBB,BBB,7
PATRICK MOLONEY,WANGARATTA,RACE 1,DALLAS COWGIRL (2),BRENT STANLEY,CC,1
PATRICK MOLONEY,WANGARATTA,RACE 5,BEL'S BANNER (5),UDYTA CLARKE,A,5
PATRICK MOLONEY,WANGARATTA,RACE 6,BEAUTY BETTY (7),LEON & TROY CORSTENS,AAA,6
PATRICK MOLONEY,WANGARATTA,RACE 7,GREEN IVY (4),KEN KEYS,CCC,7
JAMIE MOTT,WANGARATTA,RACE 1,FREE FLYING STAR (3),JOHN MCARDLE,BBB,1
JAMIE MOTT,WANGARATTA,RACE 2,INSIDE EDGE (8),JOHN MCARDLE,A,2
JAMIE MOTT,WANGARATTA,RACE 4,BORONDINO DREAM (13E),TRENT BUSUTTIN & NATALIE YOUNG,BB,4
JAMIE MOTT,WANGARATTA,RACE 6,MECKLENBERG COUNTY (11),CINDY ALDERSON,BB,6
MICHELLE PAYNE,WANGARATTA,RACE 1,LA MARSA (4),MICHELLE PAYNE,CCC,1
DANIEL STACKHOUSE,WANGARATTA,RACE 1,LUNARES (5),MATHEW ELLERTON & SIMON ZAHRA,B,1
DANIEL STACKHOUSE,WANGARATTA,RACE 2,BON SHADOW (14),GWENDA JOHNSTONE,BB,2
DANIEL STACKHOUSE,WANGARATTA,RACE 2,SETTLE THE SCORE (18),JOHN & CHRIS LEDGER,B,2
DANIEL STACKHOUSE,WANGARATTA,RACE 4,MRS WHITTEN (10),CINDY ALDERSON,BB,4
... etc.

注意:ARRON LYNCH 在文件 3 中正确出现了 3 次,PATRICK MOLONEY 正确出现了 4 次,依此类推..

这是工作代码:

# Input File (File 1: Unique List)
my $unique_jockeys_file = "UNIQUE-LIST-OF-JOCKEYS-RIDING-TODAY.list";
open (INFILE, "<$unique_jockeys_file") or die "Could not open $unique_jockeys_file $!";
foreach(<INFILE>)
{ 
    push @ri_list, $_ unless ($_ eq "\n"); 
} 
close INFILE;

# Input File (File 2: All Jockeys Rides Today)
my $jockey_rides_file = "JOCKEY-RIDES-TODAY.list";
open (INFILE, "<$jockey_rides_file") or die "Could not open $jockey_rides_file $!";
foreach(<INFILE>)
{ 
    push @lin, $_ unless ($_ eq "\n"); 
} 
close INFILE;

# Output File (File 3)
my $jockey_rides_match_file = "JOCKEY-RIDES-TODAY-MATCHED.list";
open (OUTFILE, ">$jockey_rides_match_file");
foreach $ri (@ri_list)
{ 
    chomp $ri; 
    for (@lin) 
    { 
        if ($_ =~ /$ri/ ) 
        { 
            print OUTFILE $_; 
        } 
    } 
} 
close OUTFILE;

我还想为每个名称生成一个单独的文件,其中包含每个名称的匹配记录。例如; ARRON LYNCH 有 3 条匹配记录(即 ARRONLYNCH.txt),PATRICK MOLONEY 有 4 条匹配记录(即 PATRICKMOLONEY.txt),依此类推。

这是我目前的代码。不幸的是,我不明白为什么它不起作用。

foreach $ri (@ri_list) # Input File (Unique List of Jockeys Riding Today)
{ 
    chomp $ri;
    for (@lin) # Input File (All Jockeys Rides Today)
    { 
        $line = $_;
        chomp($line);

        my ($jockey, $racecourse, $racenum, $hnameandnum, $trainer, $TDRating, $PRO) = split(/,/, $line);

        $outfile = "$jockey.jocknumrides";        

        open (OUTFILE, ">$outfile");

        if ($jockey =~ /$ri/ )
        { 
            print OUTFILE "$jockey, $racecourse, $racenum, $hnameandnum, $trainer, $TDRating, $PRO\n";
            print "$jockey, $racecourse, $racenum, $hnameandnum, $trainer, $TDRating, $PRO\n";
        }
        close OUTFILE;
    } 
} 

如有任何帮助,我们将不胜感激。

提前致谢。

请查看以下演示代码是否符合您的要求。

处理数据的方法与您的代码略有不同。

use strict;
use warnings;
use feature 'say';

my $fname_jockeys = 'UNIQUE-LIST-OF-JOCKEYS-RIDING-TODAY.list';
my $fname_rides   = 'JOCKEY-RIDES-TODAY.list';
my $fname_matched = 'JOCKEY-RIDES-TODAY-MATCHED.list';
my $fname_rider   = 'JOCKEY-RIDES-TODAY-';

my $list_jockeys = read_jockeys($fname_jockeys);
my $list_rides   = read_rides($fname_rides);

save_matched_rides($fname_matched,$list_jockeys,$list_rides);
save_individual_riders($fname_rider,$list_jockeys,$list_rides);

sub save_individual_riders {
    my $fname_prefix = shift;
    my $list_jockeys = shift;
    my $list_rides   = shift;

    my @fields = qw/jockey racecourse racenum hnameandnum trainer TDRating PRO/;

    for my $jockey ( sort @$list_jockeys ) {
        my $fname = $fname_prefix . uc $jockey . '.list';
        open my $fh, '>:encoding(utf8)', $fname
            or die "Couldn't to open $fname";

        for my $ride ( @{$list_rides->{$jockey}} ) {
            say $fh join(',',@$ride{@fields});
        }

        close $fh;
    }
}

sub save_matched_rides {
    my $fname        = shift;
    my $list_jockeys = shift;
    my $list_rides   = shift;

    my @fields = qw/jockey racecourse racenum hnameandnum trainer TDRating PRO/;

    open my $fh, '>:encoding(utf8)', $fname
        or die "Coudn't to open $fname";

    for my $jockey ( sort @$list_jockeys ) {
        for my $ride ( @{$list_rides->{$jockey}} ) {
            say $fh join(',',@$ride{@fields});
        }
    }
    close $fh;
}

sub read_rides {
    my $fname = shift;

    my %rides = ();
    my @fields = qw/jockey racecourse racenum hnameandnum trainer TDRating PRO/;

    open my $fh, '<:encoding(utf8)', $fname
        or die "Couldn't to open $fname";

    while( my $line = <$fh> ) {
        chomp $line;
        next if $line =~ /^\s*$/;
        my %ride;
        @ride{@fields} = split(',', $line);
        push @{$rides{$ride{jockey}}}, \%ride;
    }

    close $fh;

    return \%rides;
}

sub read_jockeys {
    my $fname = shift;

    my @data;

    open my $fh, '<:encoding(utf8)', $fname
        or die "Couldn't to open $fname";

    while( <$fh> ) {
        chomp;
        next if /^\s*$/;
        push @data, $_;
    }

    close $fh;

    return \@data;
}

这是一种将姓名打印到单独文件的可能方法,每个骑师一个。

它将文件 2 中的数据存储在哈希 %data 中,以骑师的名字作为键。

然后对于骑师名字文件 file1 中的每个名字,检查它是否存在于包含所有信息的文件中,如果存在,则打开文件进行写入并写入该骑师的数据。

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use feature 'say';

my %data;

my $file2 = 'f2.txt';
open my $fh, "<", $file2 or die "$file2: $!";

while (<$fh>) {
    chomp;
    next unless length;
    my $jockey = (split/,/, $_, 2)[0];

    push @{ $data{$jockey} }, $_;
}

close $fh or die $!;
# print Dumper \%data;

my $file1 = 'f1.txt';
open $fh, "<", $file1 or die "$file1: $!";

while (my $jockey = <$fh>) {
    chomp $jockey;
    if (exists $data{$jockey}) {
        my $outfile = "$jockey.jocknumrides";
        open my $output, '>', $outfile or die $!;

        for my $line (@{ $data{$jockey} }) {
            say $output $line;  
        }
        close $output or die $!;
    }
}

close $fh or die $!;