Perl,查找目录中文本文件之间的公共行
Perl, to find the common lines between text files in a Dir
我找到了读取目录中所有文本文件的代码。但我不知道如何找到它们之间的共性。请帮助我编写代码,或分享我需要进一步探索的领域。要学的东西太多但时间紧迫
use strict;
use warnings;
use English;
my $dir = 'C:\Perl_Example\Data';
foreach my $fp (glob("$dir/*.txt"))
{
printf "%s\n", $fp;
#the file header
open my $fh, "<", $fp or die "can't read open '$fp': $OS_ERROR";
#open file to read which is each file in dir
while (<$fh>)
{
printf " %s", $_;
#print the file content
}
close $fh or die "can't read close '$fp': $OS_ERROR";
}
这是“找到共同点”的一种方法。当然,在 Perl 中有不止一种方法可以做到这一点 :)
#!/usr/bin/perl -w
my %h;
for my $file (@ARGV) {
open (my $fh, $file) or die "$file: $!\n";
while(<$fh>) {
chomp;
push @{$h{$_}}, $file;
}
}
for (sort keys %h) {
if(@{$h{$_}} > 1) {
print "line <$_>\n";
print " occurs in ", join(", ", @{$h{$_}}), "\n";
}
}
exit 0;
现在测试文件,命名为{1,2,3}:
% cat 1
PING YA.RU (87.250.250.242): 56 data bytes
64 bytes from 87.250.250.242: icmp_seq=0 ttl=249 time=14.615 ms
64 bytes from 87.250.250.242: icmp_seq=1 ttl=249 time=14.943 ms
64 bytes from 87.250.250.242: icmp_seq=2 ttl=249 time=14.381 ms
64 bytes from 87.250.250.242: icmp_seq=3 ttl=249 time=14.852 ms
64 bytes from 87.250.250.242: icmp_seq=4 ttl=249 time=14.791 ms
% cat 2
PING YA.RU (87.250.250.242): 56 data bytes
64 bytes from 87.250.250.242: icmp_seq=0 ttl=249 time=14.615 ms
64 bytes from 87.250.250.242: icmp_seq=3 ttl=249 time=14.852 ms
64 bytes from 87.250.250.242: icmp_seq=4 ttl=249 time=14.791 ms
% cat 3
64 bytes from 87.250.250.242: icmp_seq=3 ttl=249 time=14.852 ms
64 bytes from 87.250.250.242: icmp_seq=4 ttl=249 time=14.791 ms
并测试脚本的运行:
% ./try.pl 1 2 3
line <64 bytes from 87.250.250.242: icmp_seq=0 ttl=249 time=14.615 ms>
occurs in 1, 2
line <64 bytes from 87.250.250.242: icmp_seq=3 ttl=249 time=14.852 ms>
occurs in 1, 2, 3
line <64 bytes from 87.250.250.242: icmp_seq=4 ttl=249 time=14.791 ms>
occurs in 1, 2, 3
line <PING YA.RU (87.250.250.242): 56 data bytes>
occurs in 1, 2
我找到了读取目录中所有文本文件的代码。但我不知道如何找到它们之间的共性。请帮助我编写代码,或分享我需要进一步探索的领域。要学的东西太多但时间紧迫
use strict;
use warnings;
use English;
my $dir = 'C:\Perl_Example\Data';
foreach my $fp (glob("$dir/*.txt"))
{
printf "%s\n", $fp;
#the file header
open my $fh, "<", $fp or die "can't read open '$fp': $OS_ERROR";
#open file to read which is each file in dir
while (<$fh>)
{
printf " %s", $_;
#print the file content
}
close $fh or die "can't read close '$fp': $OS_ERROR";
}
这是“找到共同点”的一种方法。当然,在 Perl 中有不止一种方法可以做到这一点 :)
#!/usr/bin/perl -w
my %h;
for my $file (@ARGV) {
open (my $fh, $file) or die "$file: $!\n";
while(<$fh>) {
chomp;
push @{$h{$_}}, $file;
}
}
for (sort keys %h) {
if(@{$h{$_}} > 1) {
print "line <$_>\n";
print " occurs in ", join(", ", @{$h{$_}}), "\n";
}
}
exit 0;
现在测试文件,命名为{1,2,3}:
% cat 1
PING YA.RU (87.250.250.242): 56 data bytes
64 bytes from 87.250.250.242: icmp_seq=0 ttl=249 time=14.615 ms
64 bytes from 87.250.250.242: icmp_seq=1 ttl=249 time=14.943 ms
64 bytes from 87.250.250.242: icmp_seq=2 ttl=249 time=14.381 ms
64 bytes from 87.250.250.242: icmp_seq=3 ttl=249 time=14.852 ms
64 bytes from 87.250.250.242: icmp_seq=4 ttl=249 time=14.791 ms
% cat 2
PING YA.RU (87.250.250.242): 56 data bytes
64 bytes from 87.250.250.242: icmp_seq=0 ttl=249 time=14.615 ms
64 bytes from 87.250.250.242: icmp_seq=3 ttl=249 time=14.852 ms
64 bytes from 87.250.250.242: icmp_seq=4 ttl=249 time=14.791 ms
% cat 3
64 bytes from 87.250.250.242: icmp_seq=3 ttl=249 time=14.852 ms
64 bytes from 87.250.250.242: icmp_seq=4 ttl=249 time=14.791 ms
并测试脚本的运行:
% ./try.pl 1 2 3
line <64 bytes from 87.250.250.242: icmp_seq=0 ttl=249 time=14.615 ms>
occurs in 1, 2
line <64 bytes from 87.250.250.242: icmp_seq=3 ttl=249 time=14.852 ms>
occurs in 1, 2, 3
line <64 bytes from 87.250.250.242: icmp_seq=4 ttl=249 time=14.791 ms>
occurs in 1, 2, 3
line <PING YA.RU (87.250.250.242): 56 data bytes>
occurs in 1, 2