.CSV 中的重复记录 - 如何在重复中忽略 Hash 中的相似值并仅针对 Perl 中的不同值发出警告
Duplicate records in .CSV - How do In Duplicates, to ignore the similar values in Hash and warn only for different values in Perl
以下代码检查 CSV 文件中 TO 列为“USD”的重复项。我需要你的帮助来弄清楚如何比较结果重复值,如果重复值具有相同的值,如下例所示,如果值相同,Perl 不应发出任何警告。 Perl 文件名为 Source,只需更改目录并 运行 即可。
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use List::MoreUtils qw/ uniq /;
my %seen = ();
my @uniq = ();
my %uniq;
my %data;
my %dupes;
my @rows;
my $csv = Text::CSV->new ()
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<", 'D:\Longview\ENCDEVD740\DataServers\ENCDEVD740\lvaf\inbound\data\enc_meroll_fxrate_soa_load.csv' or die "Cannot use CSV: $!";
while ( my $row = $csv->getline( $fh ) ) {
# insert row into row list
push @rows, $row;
# join the unique keys with the
# perl 'multidimensional array emulation'
# subscript character
my $key = join( $;, @{$row}[0,1] );
# if it was just one field, just use
# my $key = $row->[$keyfieldindex];
# if you were checking for full line duplicates (header lines):
# my $key = join($;, @$row);
# if %data has an entry for the record, add it to dupes
#print "@{$row}\n ";
if (exists $data{$key}) { # duplicate
# if it isn't already duplicated
# add this row and the original
if (not exists $dupes{$key}) {
push @{$dupes{$key}}, $data{$key};
}
# add the duplicate row
push @{$dupes{$key}}, $row;
} else {
$data{ $key } = $row;
}
}
$csv->eof or $csv->error_diag();
close $fh;
# print out duplicates:
warn "Duplicate Values:\n";
warn "-----------------\n";
foreach my $key (keys %dupes) {
my @keys = split($;, $key);
if (($keys[1] ne 'USD') or ($keys[0] eq 'FROMCURRENCY')){
#print "Rejecting record since duplicate records are for Outofscope currencies\n";
#print "$keys[0] = $keys[0]\n";
#print "$keys[1] = $keys[1]\n";
next;
}
else {
print "Key: @keys\n";
foreach my $dupe (@{$dupes{$key}}) {
print "\tData: @$dupe\n";
}
}
}
Source - CSV File
Query
CSV File
示例数据:
FROMCURRENCY,TOCURRENCY,RATE
AED,USD,0.272257011
ANG,USD,0.557584544
ARS,USD,0.01421147
AUD,USD,0.68635
AED,USD,0.272257011
ANG,USD,0.557584544
ARS,USD,0.01421147
Different Values for duplicates
就像 @Håkon 写的那样,您的所有重复项实际上都是相同的比率,因此不应将它们视为重复项。但是,将汇率存储在映射到每种货币的散列中可能是一个想法。这样你就不需要每次迭代都检查重复项,并且可以依赖哈希的唯一性。
您使用正确的 CSV 解析器非常好,但这里有一个示例,使用单个散列来通过 ,
拆分来跟踪重复项,因为数据看起来可靠。
#!/usr/bin/env perl
use warnings;
use strict;
my $result = {};
my $format = "%-4s | %-4s | %s\n";
while ( my $line = <DATA> ) {
chomp $line;
my ( $from, $to, $rate ) = split( /,/x, $line );
$result->{$from}->{$to}->{$rate} = 1;
}
printf( $format, "FROM", "TO", "RATES" );
printf( "%s\n", "-" x 40 );
foreach my $from ( keys %$result ) {
foreach my $to ( keys %{ $result->{$from} } ) {
my @rates = keys %{ $result->{$from}->{$to} };
next if @rates < 2;
printf( $format, $from, $to, join( ", ", @rates ) );
}
}
__DATA__
AED,USD,0.272257011
ANG,USD,0.557584545
ANG,USD,1.557584545
ARS,USD,0.01421147
ARS,USD,0.01421147
ARS,USD,0.01421147
AUD,USD,0.68635
AUD,USD,1.68635
AUD,USD,2.68635
我更改测试数据以包含具有相同速率和不同速率的重复项,结果将打印出来。
FROM | TO | RATES
----------------------------------------
ANG | USD | 1.557584545, 0.557584545
AUD | USD | 1.68635, 0.68635, 2.68635
以下代码检查 CSV 文件中 TO 列为“USD”的重复项。我需要你的帮助来弄清楚如何比较结果重复值,如果重复值具有相同的值,如下例所示,如果值相同,Perl 不应发出任何警告。 Perl 文件名为 Source,只需更改目录并 运行 即可。
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use List::MoreUtils qw/ uniq /;
my %seen = ();
my @uniq = ();
my %uniq;
my %data;
my %dupes;
my @rows;
my $csv = Text::CSV->new ()
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<", 'D:\Longview\ENCDEVD740\DataServers\ENCDEVD740\lvaf\inbound\data\enc_meroll_fxrate_soa_load.csv' or die "Cannot use CSV: $!";
while ( my $row = $csv->getline( $fh ) ) {
# insert row into row list
push @rows, $row;
# join the unique keys with the
# perl 'multidimensional array emulation'
# subscript character
my $key = join( $;, @{$row}[0,1] );
# if it was just one field, just use
# my $key = $row->[$keyfieldindex];
# if you were checking for full line duplicates (header lines):
# my $key = join($;, @$row);
# if %data has an entry for the record, add it to dupes
#print "@{$row}\n ";
if (exists $data{$key}) { # duplicate
# if it isn't already duplicated
# add this row and the original
if (not exists $dupes{$key}) {
push @{$dupes{$key}}, $data{$key};
}
# add the duplicate row
push @{$dupes{$key}}, $row;
} else {
$data{ $key } = $row;
}
}
$csv->eof or $csv->error_diag();
close $fh;
# print out duplicates:
warn "Duplicate Values:\n";
warn "-----------------\n";
foreach my $key (keys %dupes) {
my @keys = split($;, $key);
if (($keys[1] ne 'USD') or ($keys[0] eq 'FROMCURRENCY')){
#print "Rejecting record since duplicate records are for Outofscope currencies\n";
#print "$keys[0] = $keys[0]\n";
#print "$keys[1] = $keys[1]\n";
next;
}
else {
print "Key: @keys\n";
foreach my $dupe (@{$dupes{$key}}) {
print "\tData: @$dupe\n";
}
}
}
Source - CSV File
Query
CSV File
示例数据:
FROMCURRENCY,TOCURRENCY,RATE
AED,USD,0.272257011
ANG,USD,0.557584544
ARS,USD,0.01421147
AUD,USD,0.68635
AED,USD,0.272257011
ANG,USD,0.557584544
ARS,USD,0.01421147
Different Values for duplicates
就像 @Håkon 写的那样,您的所有重复项实际上都是相同的比率,因此不应将它们视为重复项。但是,将汇率存储在映射到每种货币的散列中可能是一个想法。这样你就不需要每次迭代都检查重复项,并且可以依赖哈希的唯一性。
您使用正确的 CSV 解析器非常好,但这里有一个示例,使用单个散列来通过 ,
拆分来跟踪重复项,因为数据看起来可靠。
#!/usr/bin/env perl
use warnings;
use strict;
my $result = {};
my $format = "%-4s | %-4s | %s\n";
while ( my $line = <DATA> ) {
chomp $line;
my ( $from, $to, $rate ) = split( /,/x, $line );
$result->{$from}->{$to}->{$rate} = 1;
}
printf( $format, "FROM", "TO", "RATES" );
printf( "%s\n", "-" x 40 );
foreach my $from ( keys %$result ) {
foreach my $to ( keys %{ $result->{$from} } ) {
my @rates = keys %{ $result->{$from}->{$to} };
next if @rates < 2;
printf( $format, $from, $to, join( ", ", @rates ) );
}
}
__DATA__
AED,USD,0.272257011
ANG,USD,0.557584545
ANG,USD,1.557584545
ARS,USD,0.01421147
ARS,USD,0.01421147
ARS,USD,0.01421147
AUD,USD,0.68635
AUD,USD,1.68635
AUD,USD,2.68635
我更改测试数据以包含具有相同速率和不同速率的重复项,结果将打印出来。
FROM | TO | RATES
----------------------------------------
ANG | USD | 1.557584545, 0.557584545
AUD | USD | 1.68635, 0.68635, 2.68635