在 FASTA 文件中查找基序和基序位置 - Perl

Question

有人可以帮我处理这段 Perl 代码吗？当我运行它时，什么也没有发生。没有错误或任何对我来说很奇怪的事情。它读入并打开文件就好了。我相信问题出在 while 循环或 foreach 循环中，因为我真的不认为我理解它们。我对这个很烂的老师很陌生。

说明：声明一个名为motif的标量变量，并将其设为AAA。声明一个名为 locations 的数组变量，这是存储主题位置的位置。将基因置于标量变量中。现在在 amborella 基因中寻找那个基序。代码应打印图案的位置和找到的图案。您将需要编写一个 while 循环来搜索 motif 并包含 push、pos 和 –length 命令以保存和报告位置。然后您将需要一个 foreach 循环来打印位置和主题。（如果它只报告基因第一行的位置，请记住这是因为基因在一个标量变量中，只会读取第一行。这是可以接受的。

到目前为止我的代码：

#!/usr/bin/perl
use warnings;
use strict;

#Declare a scalar variable called motif and make it AAA.
my$motif="AAA";

#Declare an array variable called locations, which is where the
#locations of the motif will be stored.
my@locations=();
my$foundMotif="";
my$position=();

#Place the gene in a scalar variable.
my$geneFileName = 'amborella.txt';
open(GENEFILE, $geneFileName) or die "Can't read file!";
my$gene = <GENEFILE>;

#Now search for that motif in the amborella gene.
#The code should print the position of the motif and the motif
#found. You will need to write a while loop that searches for the
#motif and includes push, pos, and –length commands in order to
#save and report locations.

while($foundMotif =~ m/AAA/g) {
$position=(pos($foundMotif)-3);
push (@locations, $position);
}

#Then you will need a foreach loop to print the locations and the motif.
foreach $position (@locations){
print "\n Found motif: ", $motif, "\n at position: ", $position;
}

#close the file
close GENEFILE;

exit;

Answer 1

你的程序很好，只是一个简单的混淆。

您正在匹配一个空字符串。

while($foundMotif =~ m/AAA/g) {
  $position = (pos($foundMotif)-3);
  push (@locations, $position);
}

您正在 $foundMotif 寻找 AAA。但这是一个空字符串，因为您只是进一步声明了它。你的基因串（免责声明：我对生物信息学一窍不通）是$gene。这就是你需要匹配的。

让我们一步一步来。我已经简化了您的代码并放入了一个示例字符串。我知道那不是基因的样子，但这并不重要。这已经修复了。

use strict;
use warnings;

my $motif = "AAA";

my @locations  = ();

# ... skip reading the file
my $gene = "ABAABAAABAAAAB\n";

while ($gene =~ m/$motif/g) {                     # 1, 2
    my $position = (pos($gene) - length($motif)); # 3, 4
    push(@locations, $position);
}

foreach $position (@locations) {
    print "\n Found motif: ", $motif, "\n at position: ", $position;
}

如果您运行这样做，代码现在会产生有意义的输出。

 Found motif: AAA
 at position: 5
 Found motif: AAA
 at position: 9

我做了四处修改：

您需要在 $gene
如果您不使用它进行搜索，您的变量 $motif 就没有意义。这样，您的程序就会变得动态。
同样，您需要在 $gene

pos()

要使其动态化，您不应该对 length

您根本不需要 $foundMotif 变量。 $position 实际上可以是词法它所在的块。这意味着，每次循环运行时它都会是一个不同的变量，这很好实践。在 Perl 中，您希望始终使用尽可能小的 scope 变量，并且仅在需要时才声明它们，而不是提前声明它们。

由于这是一个学习练习，所以单独迭代数组是有意义的。在实际程序中，如果您以后不使用它们，您可以删除 foreach 循环和数组并直接输出位置。

在 FASTA 文件中查找基序和基序位置 - Perl

Finding motifs and position of motif in FASTA file - Perl

perl

loops

position

fasta