在 perl 中填充哈希的最快方法
Fastest way to populate a hash in perl
我正在尝试从大约 564k 行的文件中填充 perl 中的哈希,代码执行大约需要 1.6~2.1 秒,而 C# 中的等效代码大约需要 0.8 秒才能完成。在 Perl 中有更好的方法吗?
到目前为止我已经尝试过了:
# 1 - this version take ~ +1.6 seconds to fill the hash from file with ~ 564000
my %voc;
open(F,"<$voc_file");
while(defined(my $line=<F>)) {
chomp($line);
$voc{$line} = 1;
}
close(F);
还有这个
# 2 - this version take ~ +2.1 seconds to fill the hash from file with ~ 564000
my %voc;
my @voc_keys;
my @array_of_ones;
open(F,"<$voc_file");
my $voc_keys_index = 0;
while(defined(my $line=<F>)) {
chomp($line);
$voc_keys[$voc_keys_index] = $line;
$array_of_ones[$voc_keys_index] = 1;
$voc_keys_index ++;
}
@voc{@voc_keys} = @array_of_ones;
close(F);
在 c# 中,我使用的是:
var voc = new Dictionary<String, int>();
foreach (string line in File.ReadLines(pathToVoc_file))
{
var trimmedline = line.TrimEnd(new char[] { '\n' });
voc[trimmedline] = 1;
}
而且只需要700~800毫秒
当然 C# 会更快。
您可以通过替换
来节省一点时间和一些内存
$voc{$line} = 1; ... if ($voc{$key}) { ... } ...
和
undef $voc{$line}; ... if (exists($voc{$key})) { ... } ...
绝对避免将 1 存储为数据并使用 exists 可以节省时间和内存。您可以通过从循环中删除块来获得更多:
my %voc;
open(F,"<$voc_file");
chomp, undef $voc{$_} while <F>;
close(F);
基准测试结果(使用 20 个字符行):
Benchmark: running ikegami, original, statementmodifier, statementmodifier_undef for at least 10 CPU seconds...
ikegami: 10 wallclock secs ( 9.54 usr + 0.46 sys = 10.00 CPU) @ 2.10/s (n=21)
original: 10 wallclock secs ( 9.62 usr + 0.45 sys = 10.07 CPU) @ 2.09/s (n=21)
statementmodifier: 10 wallclock secs ( 9.61 usr + 0.48 sys = 10.09 CPU) @ 2.18/s (n=22)
statementmodifier_undef: 11 wallclock secs ( 9.85 usr + 0.48 sys = 10.33 CPU) @ 2.23/s (n=23)
基准:
use strict;
use warnings;
use Benchmark 'timethese';
my $voc_file = 'rand.txt';
sub check {
my ($voc) = @_;
unless (keys %$voc == 564000) {
warn "bad number of keys ", scalar keys %$voc;
}
chomp(my $expected_line = `head -1 $voc_file`);
unless (exists $voc->{$expected_line}) {
warn "bad data";
}
return;
}
timethese(-10, {
'statementmodifier' => sub {
my %voc;
open(F,"<$voc_file");
chomp, $voc{$_} = 1 while <F>;
close(F);
#check(\%voc);
return;
},
'statementmodifier_undef' => sub {
my %voc;
open(F,"<$voc_file");
chomp, undef $voc{$_} while <F>;
close(F);
#check(\%voc);
return;
},
'original' => sub {
my %voc;
open(F,"<$voc_file");
while(defined(my $line=<F>)) {
chomp($line);
$voc{$line} = 1;
}
close(F);
#check(\%voc);
return;
},
'ikegami' => sub {
my %voc;
open(F,"<$voc_file");
while(defined(my $line=<F>)) {
chomp($line);
undef $voc{$line};
}
close(F);
#check(\%voc);
return;
},
});
(原来的错误答案替换为这个。)
我正在尝试从大约 564k 行的文件中填充 perl 中的哈希,代码执行大约需要 1.6~2.1 秒,而 C# 中的等效代码大约需要 0.8 秒才能完成。在 Perl 中有更好的方法吗?
到目前为止我已经尝试过了:
# 1 - this version take ~ +1.6 seconds to fill the hash from file with ~ 564000
my %voc;
open(F,"<$voc_file");
while(defined(my $line=<F>)) {
chomp($line);
$voc{$line} = 1;
}
close(F);
还有这个
# 2 - this version take ~ +2.1 seconds to fill the hash from file with ~ 564000
my %voc;
my @voc_keys;
my @array_of_ones;
open(F,"<$voc_file");
my $voc_keys_index = 0;
while(defined(my $line=<F>)) {
chomp($line);
$voc_keys[$voc_keys_index] = $line;
$array_of_ones[$voc_keys_index] = 1;
$voc_keys_index ++;
}
@voc{@voc_keys} = @array_of_ones;
close(F);
在 c# 中,我使用的是:
var voc = new Dictionary<String, int>();
foreach (string line in File.ReadLines(pathToVoc_file))
{
var trimmedline = line.TrimEnd(new char[] { '\n' });
voc[trimmedline] = 1;
}
而且只需要700~800毫秒
当然 C# 会更快。
您可以通过替换
来节省一点时间和一些内存$voc{$line} = 1; ... if ($voc{$key}) { ... } ...
和
undef $voc{$line}; ... if (exists($voc{$key})) { ... } ...
绝对避免将 1 存储为数据并使用 exists 可以节省时间和内存。您可以通过从循环中删除块来获得更多:
my %voc;
open(F,"<$voc_file");
chomp, undef $voc{$_} while <F>;
close(F);
基准测试结果(使用 20 个字符行):
Benchmark: running ikegami, original, statementmodifier, statementmodifier_undef for at least 10 CPU seconds...
ikegami: 10 wallclock secs ( 9.54 usr + 0.46 sys = 10.00 CPU) @ 2.10/s (n=21)
original: 10 wallclock secs ( 9.62 usr + 0.45 sys = 10.07 CPU) @ 2.09/s (n=21)
statementmodifier: 10 wallclock secs ( 9.61 usr + 0.48 sys = 10.09 CPU) @ 2.18/s (n=22)
statementmodifier_undef: 11 wallclock secs ( 9.85 usr + 0.48 sys = 10.33 CPU) @ 2.23/s (n=23)
基准:
use strict;
use warnings;
use Benchmark 'timethese';
my $voc_file = 'rand.txt';
sub check {
my ($voc) = @_;
unless (keys %$voc == 564000) {
warn "bad number of keys ", scalar keys %$voc;
}
chomp(my $expected_line = `head -1 $voc_file`);
unless (exists $voc->{$expected_line}) {
warn "bad data";
}
return;
}
timethese(-10, {
'statementmodifier' => sub {
my %voc;
open(F,"<$voc_file");
chomp, $voc{$_} = 1 while <F>;
close(F);
#check(\%voc);
return;
},
'statementmodifier_undef' => sub {
my %voc;
open(F,"<$voc_file");
chomp, undef $voc{$_} while <F>;
close(F);
#check(\%voc);
return;
},
'original' => sub {
my %voc;
open(F,"<$voc_file");
while(defined(my $line=<F>)) {
chomp($line);
$voc{$line} = 1;
}
close(F);
#check(\%voc);
return;
},
'ikegami' => sub {
my %voc;
open(F,"<$voc_file");
while(defined(my $line=<F>)) {
chomp($line);
undef $voc{$line};
}
close(F);
#check(\%voc);
return;
},
});
(原来的错误答案替换为这个。)