仅从 fasta 文件中提取第一个序列

Question

我只想从包含多个序列的fasta 文件中提取第一个序列。我在下面有这段代码，但我无法让循环恰到好处地相互配合。

while (my $line = <$in_fh>) {
    chomp $line;
    for (my $i = 1; $i <= 1; $i++) {
        print $out_fh $line;
    }
}

close $out_fh;

我认为它在 while 循环中混淆了，但无论我尝试什么，它都不正确。例如，我尝试将 for 循环移到外面，但它没有用。它是循环的类型吗？非常感谢大家的指点。

Answer 1

如果您只需要输入文件的第一行，则不需要 while 循环。

my $line = <$in_fh>;
print $out_fh $line;

编辑：

在研究 FASTA format, I think it is sufficiently complicated enough that you shouldn't parse it manually. Instead, you should use BioPerl 之后。

编辑 2：

这是一个使用 BioPerl 的工作示例：

#!/usr/bin/perl

use strict;
use warnings;

use Bio::Seq;
use Bio::SeqIO;

my $fasta_file = shift @ARGV or die "Usage: [=11=] FASTA_FILE\n";

my $seqin = Bio::SeqIO->new( -format => 'Fasta', -file => $fasta_file )
  or die "can't load fasta file: $fasta_file\n";

my $seqobj = $seqin->next_seq();

my $sequence = $seqobj->seq();

print $sequence;

Answer 2

因为每个 fasta 记录头都以 > 开头，并且序列中不应该包含该字符。继续阅读行直到看到以 >.

开头的第二行应该是安全的

my $line = <$in_fh>;
#print first line no matter what
print $line;

while($line = <$in_fh>){
  #line must start with ">"
  unless( $line =~/^>.+/){
     print $line;
  }else{
    last;  #skip to the end
 }

}

Answer 3

我知道你想要 perl，但 awk 解决方案更短：

awk '/^>/{if(N)exit;++N;} {print;}' in.fa

仅从 fasta 文件中提取第一个序列

extract first sequence only from a fasta file

perl

loops

sequence

bioinformatics

fasta