Perl 模块,用于检查是否通过将文本插入另一个字符串来创建一个字符串

Perl module to check whether a string was created by inserting text into another string

问题

我有两个字符串 $base$succ,想检查是否可以通过在任意位置插入任意字符串从 $base 创建 $succ

一些例子,写成isSucc($base, $succ):

我们可以假设,$base$succ 不包含任何(垂直)空格。通常我们有 length($base) < length($succ) < 1000。良好的性能并不重要,但会很高兴。

我已经知道一种实现方法 isSucc*.

问题

* 我的方法

使用自定义成本模型计算编辑/Levenshtein distance成本(插入)=0,成本(删除)=成本(替代)=1)。然后检查编辑距离是否为0.

比较答案

我想比较一下贪心匹配非贪心匹配三种方案,但是匹配方法花费的时间通常是循环解决方案的 100 倍以上,因此我中止了测试。尽管如此——或许正是出于这个原因——我们有一个明显的赢家:循环解决方案。
非常感谢 Christoffer Hammarström。

sub is_subsequence {
    my ($needles, $haystack) = @_;
    my $found = 0;
    for my $needle (split '', $needles) {               # for each character $needle in $needles
        $found = 1 + index $haystack, $needle, $found;  # find it after the previous one in $haystack
        return 0 unless $found;                         # return false if we can't
    }
    return 1;                                           # return true if we found all $needles in $haystack
}

use Test::More tests => 6;              # 1..6
is 1, is_subsequence("abc", "abcX");    # ok 1
is 1, is_subsequence("abc", "XabYcZ");  # ok 2
is 0, is_subsequence("abc", "abX");     # ok 3
is 0, is_subsequence("abc", "cab");     # ok 4
is 1, is_subsequence("abc", "cabc");    # ok 5
is 1, is_subsequence("", "anything");   # ok 6
{
    my $last_base;
    my $last_re;

    sub is_succ {
        my ($base, $succ) = @_;
        my $re;

        if ($base eq $last_base) {
            $re = $last_re;
        }
        else {
            $last_base = $base;
            $last_re = $re = join(".*?", map { quotemeta($_) } split("", $base));
        }

        return $succ =~ /$re/;
    }
}
sub isSucc {
 my($base, $succ)=@_;
 $base=~s/./quotemeta($&).".*?"/ge;
 $succ =~ $base;
}

为字符串 abc 创建正则表达式 a.*?b.*?c.*? 并测试 $succ.

正则表达式的强大功能可以使这个任务变得非常简单

use strict;
use warnings;
use feature 'say';

my $needle = 'abc';

while(<DATA>) {
    chomp;
    say "'$needle' in '$_'" if search_needle($needle,$_);
}

say "'' in 'Anything'" if search_needle('','Anything');

sub search_needle {
    my $needle = shift;
    my $haystack = shift;

    my $re = join('.*?', split('',$needle));

    return $haystack =~ /$re/;

}

__DATA__
abcX
XabYcZ
abX
cab
cabc

输出

'abc' in 'abcX'
'abc' in 'XabYcZ'
'abc' in 'cabc'
'' in 'Anything'