根据perl中的第一列合并N个文件
Merging N no of files based on their first column in perl
我的问题与之前发布的 question 类似。
我有很多文件,我需要根据第一列 ID 的存在与否来合并它们,但是在合并时我在输出文件中得到很多空值,我希望这些空值是如果它不存在于另一个文件中则为零。下面的示例仅基于两个文件内容,但我有很多类似这种格式(表格)的示例文件。
例如:
File1
ID Value
123 1
231 2
323 3
541 7
File2
ID Value
541 6
123 1
312 3
211 4
Expected Output:
ID File1 File2
123 1 1
231 2 0
323 3 0
541 7 6
312 0 3
211 0 4
Obtaining Output:
ID File1 File2
123 1 1
231 2
323 3
541 7 6
312 undef 3
211 undef 4
正如您在上面看到的,我得到了输出,但在 file2 列中,它没有添加零或留空,如果是 file1 列,它具有 undef 值。我已经检查了 undef 值,然后我的最终输出给出了零来代替 undef 值,但我仍然有那些空白。请在下面找到我的代码(仅为两个文件硬编码)。
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
use Data::Dumper;
my $path = "/home/pranjay/Projects/test";
my @files = ("s1.txt","s2.txt");
my %classic_com;
my $cnt;
my $classic_txt;
my $sample_cnt = 0;
my $classic_txtcomb = "test_classic.txt";
open($classic_txt,">$path/$classic_txtcomb") or die "Couldn't open file
$classic_txtcomb for writing,$!";
print $classic_txt "#ID\t"."file1\tfile2\n";
foreach my $file(@files){
$sample_cnt++;
print "$sample_cnt\n";
open($cnt,"<$path/$file")or die "Couldn't open file $file for reading,$!";
while(<$cnt>){
chomp($_);
my @count = ();
next if($_=~/^ID/);
my @record=();
@record=split(/\t/,$_);
my $scnt = $sample_cnt -1;
if((exists($classic_com{$record[0]})) and ($sample_cnt > 0)){
${$classic_com{$record[0]}}[$scnt]=$record[1];
}else{
$count[$scnt] = "$record[1]";
$classic_com{$record[0]}= [@count];
}
}
}
my %final_txt=();
foreach my $key ( keys %classic_com ) {
#print "$key: ";
my @val = @{ $classic_com{$key} };
my @v;
foreach my $i ( @val ) {
if(not defined($i)){
$i = 0;
push(@v, $i);
}else{
push(@v, $i);
next;
}
}
$final_txt{$key} = [@v];
}
#print Dumper %classic_com;
while(my($key,$value)=each(%final_txt)){
my $val=join("\t", @{$value});
print $classic_txt "$key\t"."@{$value}"."\n";
}
只需将输入文件读入数组散列。最上面的键是 ID,每个内部数组在第 i 位置包含文件 i 的值。打印时,使用 //
defined-or 运算符将 undefs 替换为零:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my %merged;
my $file_tally = 0;
while (my $file = shift) {
open my $in, '<', $file or die "$file: $!";
<$in>; # skip the header
while (<$in>) {
my ($id, $value) = split;
$merged{$id}[$file_tally] = $value;
}
++$file_tally;
}
for my $value (keys %merged) {
my @values = @{ $merged{$value} };
say join "\t", $value, map $_ // 0, @values[0 .. $file_tally - 1];
}
program.pl
my %val;
/ (\d+) \s+ (\d+) /x and $val{}{$ARGV} = while <>;
pr( 'ID', my @f = sort keys %{{map%$_,values%val}} );
pr( $_, map$_//0, @{$val{$_}}{@f} ) for sort keys %val;
sub pr{ print join("\t",@_)."\n" }
运行:
perl program.pl s1.txt s2.txt
ID s1.txt s2.txt
123 1 1
211 0 4
231 2 0
312 0 3
323 3 0
541 7 6
我的问题与之前发布的 question 类似。
我有很多文件,我需要根据第一列 ID 的存在与否来合并它们,但是在合并时我在输出文件中得到很多空值,我希望这些空值是如果它不存在于另一个文件中则为零。下面的示例仅基于两个文件内容,但我有很多类似这种格式(表格)的示例文件。
例如:
File1
ID Value
123 1
231 2
323 3
541 7
File2
ID Value
541 6
123 1
312 3
211 4
Expected Output:
ID File1 File2
123 1 1
231 2 0
323 3 0
541 7 6
312 0 3
211 0 4
Obtaining Output:
ID File1 File2
123 1 1
231 2
323 3
541 7 6
312 undef 3
211 undef 4
正如您在上面看到的,我得到了输出,但在 file2 列中,它没有添加零或留空,如果是 file1 列,它具有 undef 值。我已经检查了 undef 值,然后我的最终输出给出了零来代替 undef 值,但我仍然有那些空白。请在下面找到我的代码(仅为两个文件硬编码)。
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
use Data::Dumper;
my $path = "/home/pranjay/Projects/test";
my @files = ("s1.txt","s2.txt");
my %classic_com;
my $cnt;
my $classic_txt;
my $sample_cnt = 0;
my $classic_txtcomb = "test_classic.txt";
open($classic_txt,">$path/$classic_txtcomb") or die "Couldn't open file
$classic_txtcomb for writing,$!";
print $classic_txt "#ID\t"."file1\tfile2\n";
foreach my $file(@files){
$sample_cnt++;
print "$sample_cnt\n";
open($cnt,"<$path/$file")or die "Couldn't open file $file for reading,$!";
while(<$cnt>){
chomp($_);
my @count = ();
next if($_=~/^ID/);
my @record=();
@record=split(/\t/,$_);
my $scnt = $sample_cnt -1;
if((exists($classic_com{$record[0]})) and ($sample_cnt > 0)){
${$classic_com{$record[0]}}[$scnt]=$record[1];
}else{
$count[$scnt] = "$record[1]";
$classic_com{$record[0]}= [@count];
}
}
}
my %final_txt=();
foreach my $key ( keys %classic_com ) {
#print "$key: ";
my @val = @{ $classic_com{$key} };
my @v;
foreach my $i ( @val ) {
if(not defined($i)){
$i = 0;
push(@v, $i);
}else{
push(@v, $i);
next;
}
}
$final_txt{$key} = [@v];
}
#print Dumper %classic_com;
while(my($key,$value)=each(%final_txt)){
my $val=join("\t", @{$value});
print $classic_txt "$key\t"."@{$value}"."\n";
}
只需将输入文件读入数组散列。最上面的键是 ID,每个内部数组在第 i 位置包含文件 i 的值。打印时,使用 //
defined-or 运算符将 undefs 替换为零:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my %merged;
my $file_tally = 0;
while (my $file = shift) {
open my $in, '<', $file or die "$file: $!";
<$in>; # skip the header
while (<$in>) {
my ($id, $value) = split;
$merged{$id}[$file_tally] = $value;
}
++$file_tally;
}
for my $value (keys %merged) {
my @values = @{ $merged{$value} };
say join "\t", $value, map $_ // 0, @values[0 .. $file_tally - 1];
}
program.pl
my %val;
/ (\d+) \s+ (\d+) /x and $val{}{$ARGV} = while <>;
pr( 'ID', my @f = sort keys %{{map%$_,values%val}} );
pr( $_, map$_//0, @{$val{$_}}{@f} ) for sort keys %val;
sub pr{ print join("\t",@_)."\n" }
运行:
perl program.pl s1.txt s2.txt
ID s1.txt s2.txt
123 1 1
211 0 4
231 2 0
312 0 3
323 3 0
541 7 6