Perl：子字符串中的字符串或字符串中的子字符串

Question

我正在处理一个文件中的 DNA 序列，这个文件的格式类似于这样，尽管有多个序列：

>name of sequence
EXAMPLESEQUENCEATCGATCGATCG

我需要能够判断一个变量（也是一个序列）是否匹配文件中的任何序列，以及它匹配的序列的名称（如果有）是什么。由于这些序列的性质，我的整个变量可以包含在文件的一行中，或者变量的一行可以是我的变量的一部分。现在我的代码看起来像这样：

use warnings;
use strict;
my $filename = "/users/me/file/path/file.txt";
my $exampleentry = "ATCG";
my $returnval = "The sequence does not match any in the file";
open file, "<$filename" or die "Can't find file";
my @Name;
my @Sequence;
my $inx = 0;
while (<file>){
    $Name[$inx] = <file>;
    $Sequence[$inx] = <file>;
    $indx++;
}unless(index($Sequence[$inx], $exampleentry) != -1 || index($exampleentry, $Sequence[$inx]) != -1){
    $returnval = "The sequence matches: ". $Name[$inx];
}
print $returnval;

然而，即使我故意将 $entry 设置为文件中的匹配项，我仍然 return The sequence does not match any in the file。此外，当运行代码时，我得到 Use of uninitialized value in index at thiscode.pl line 14, <file> line 3002. 以及 Use of uninitialized value within @Name in concatenation (.) or string at thiscode.pl line 15, <file> line 3002.

如何执行此搜索？

Answer 1

我假设此脚本的目的是确定 $exampleentry 是否匹配文件 file.txt 中的任何记录。记录在这里描述了一个 DNA 序列，对应于文件中的三个连续行。如果变量 $exampleentry 与记录的第三行匹配，则将匹配该序列。匹配在这里意味着

$exampleentry 是 $line 的子串，或者
$line 是 $exampleentry、

其中 $line 指的是文件中的相应行。

首先，考虑输入文件file.txt:

>name of sequence
EXAMPLESEQUENCEATCGATCGATCG

在程序中，您尝试读取这些两行，使用三个调用 readline。因此，最后一次调用 readline 将 return undef 因为没有更多的行要读取。

因此 file.txt 中的最后两行格式错误似乎是合理的，正确的格式应该是：

>name of sequence
EXAMPLESEQUENCE
ATCGATCGATCG

如果我现在理解正确，希望这能解决您的问题：

use feature qw(say);
use strict;
use warnings;

my $filename = "file.txt";
my $exampleentry = "ATCG";
my $returnval = "The sequence does not match any in the file";
open (my $fh, '<', $filename ) or die "Can't find file: $!";
my @name;
my @sequence;
my $inx = 0;
while (<$fh>) {
    chomp ($name[$inx] = <$fh>);
    chomp ($sequence[$inx] = <$fh>);
    if (
        index($sequence[$inx], $exampleentry) != -1
        || index($exampleentry, $sequence[$inx]) != -1
    ) {
        $returnval = "The sequence matches: ". $name[$inx];
        last;
    }
}
say $returnval;

备注：

我已将变量名称更改为遵循 snake_case convention。例如，变量 @Name 最好使用全部小写字母 @name.
我更改了 open() 调用以遵循新推荐的 3 参数样式，请参阅 Don't Open Files in the old way 了解更多信息。
使用的功能say instead of print
在每个 readline 后添加一个 chomp 以避免在数组中存储换行符。

Perl：子字符串中的字符串或字符串中的子字符串

Perl: String in Substring or Substring in String

string

perl

dna-sequence