在 perl 中调用哈希中的键

Calling upon keys within a hash in perl

这是对本网站 的扩展。

现在我有以下代码:

#!/usr/bin/perl
use strict;
use warnings;

#Print Directory

print "Please provide the directory containing the FASTQ files from your Illumina MiSeq run \n";
my $FASTQ = <STDIN>;
chomp ($FASTQ);

print "Please provide the minimum overlap between the two reads in bp";
my $min = <STDIN>;
chomp ($min);

print "Please provide the maximum overlap between the two reads in bp";
my $max = <STDIN>;
chomp ($max);

print "Now provide the output directory for your merged fastq reads";
my $output = <STDIN>;
chomp ($output);

#Open Directory

my $dir = $FASTQ;
opendir(DIR, $dir) or die "Cannot open $dir: $!";
my @reads = grep { /.fastq/ } readdir DIR;
closedir DIR;

sub parse_fastq_filename {
    # Strip the suffix
    my $filename = shift;

    # Parse Sample-ID_Adapter-Sequence_L001_R1_001
    my($sample_id, $adapter_sequence, $L001, $format, [=10=]1) = split /_/, $filename;

    return {
        filename            => $filename,
        sample_id           => $sample_id,
        adapter_sequence    => $adapter_sequence,
        $L001               => $L001,
        format              => $format,
        001                 => [=10=]1
    };
}


# The pairs of files will be stored within the following hash.
my %pairs;

# List just the *.fastq files
for my $filename (@reads) {
    # Parse the filename into a hash reference
    my $fastq = parse_fastq_filename($filename);

    # Put each parsed fastq filename into its pair
    $pairs{ $fastq->{sample_id} }{ $fastq->{format} } = $fastq;
}

for my $sample (values %pairs) {
    # Go through each pair in the sample
    for my $fastq (values %$sample) {
        print "$fastq->{filename} has format $fastq->{format}\n";
    }
}

for my $forward (values %pairs) {
    for my $fastq (values %$forward) {

    }
}

#print the keys within the hash
foreach (keys %pairs){
    print "$_ => $pairs{$_}\n";
}

#place the hash into an array
my @unique = keys %pairs;
print @unique;

#change directory to the user-inputted directory and merge reads 
chdir $dir;
`/usr/local/flash/flash @array[0] @array[1] -m $min -M $max -d $output`;

我最后按照Unix命令的要求成功地将正向和反向fastq文件相互配对。现在我陷入了如何相互调用成对的各个组件的问题。

我考虑过通过输入密钥作为用户输入来访问散列,但这些数字是在脚本中随机生成的,我不想强​​制用户在 [=26= 之前输入这些值]宁 Unix 脚本。

此外,我检查过的所有示例都已经在散列中提供了固定数量的键。根据用户想要合并的文件数量,散列中的键数量也会有所不同。

我知道我想做一个循环,这样命令就可以运行多次;程序FLASH一次只能接受一个正向和反向fastq文件。因此,必须循环正向和反向 fastq 读取,以便处理每个读取。

如何在将它们配对成散列后取出我要使用的特定文件?

我最好的猜测是将给定目录中的所有 .fastq 文件配对,并为每对调用一次 flash 实用程序

您自己的代码存在许多问题,您似乎在 %pairs 哈希中保留了您需要的更多信息(我认为您只需要文件名、示例 ID、和格式)所以我写了这个

我还使用了一个简单的 glob 而不是 opendirreaddirclosedir 这似乎是一个更好的选择

我已经尽我所能进行了测试,看起来还不错

#!/usr/bin/perl
use strict;
use warnings;

use File::Basename qw/ basename /;

my $flash = '/usr/local/flash/flash';

print "Please provide the directory containing\n";
print "the fastq files from your Illumina MiSeq run: ";
my $fastq_file_dir = <STDIN>;
chomp $fastq_file_dir;

print "Please provide the minimum overlap between the two reads in bp: ";
my $min_overlap = <STDIN>;
chomp $min_overlap;

print "Please provide the maximum overlap between the two reads in bp: ";
my $max_overlap = <STDIN>;
chomp $max_overlap;

print "Now provide the output directory for your merged fastq reads: ";
my $out_dir = <STDIN>;
chomp $out_dir;

my @files = glob "$fastq_file_dir/*.fastq";

my %pairs;

for my $fastq_file ( @files ) {

    my $file = basename $fastq_file;
    my ($sample_id, $format) = (split /_/, $file)[0,3];

    $pairs{ $sample_id }{ $format } = $file;
}

printf "Processing %d pairs of FASTQ files\n\n", scalar keys %pairs;
chdir $fastq_file_dir;

for my $sample ( sort keys %pairs ) {

    my $pair = $pairs{$sample};
    my ($forward, $reverse) = @{$pair}{qw/ R1 R2 /};

    print "Forward: $forward\n";
    print "Reverse: $reverse\n";
    print "\n";

    my $cmd = qq{$flash $forward $reverse -m $min_overlap -M $max_overlap -d $out_dir};
    system $cmd;
}