在不使用 preg_* 函数的情况下重建字符串中可能重复字符的第一个实例的更好方法

Better way to reconstruct first instance of a possibly-repeated character within a string without using preg_* functions

我正在使用的一个函数需要检测字符的第一个实例,如果字符重复,则重建重复的子字符串。例如:

$x = 'fhdfhbc::::dcdcdcuttr482rdvcjv:ducvdk:::chjvdbj'; // ---> function should extract ::::

我不想使用任何 preg_* 功能,因为我尽可能避免使用这些功能(因为这些功能很慢)。我目前的解决方案是:

$char = ":"; // this would be set as necessary

$char_substring = str_repeat($char, strspn(strstr($x, $char), $char)); // yields ---> ::::

请注意,您不能在此处使用 strrpos,因为(在这种情况下)字符串的另一端可能有冒号。您可以使用 explode,然后 运行 一个 forforeach 循环,连接空的,或者这个的一些变体:

$explode = explode($char, $x);
$substring = $char; // explode array should have 1 less empty member than the repeated character, so need to start with char count of 1
$emptyEncountered = false;
for($i = 0, $count = count($explode); $i < $count; $i++) {
    if ($explode[$i]) {
        if ($emptyEncountered) break;
    } else {
        $emptyEncountered = true;
        $substring .= $char;
    }
}

echo $substring; // ---> ::::

有没有比使用 preg_*、for/each 循环或 str_repeat(strspn(strstr())) 更好的方法?

适当的 preg_* 实施将优于您的 explode 方法,不仅在时间上而且在内存消耗和所需分配方面也是如此。

我能想到的唯一有效且符合您的约束的实现是 while 循环:

$substring = '';

$i = strpos($haystack, $needle);
do {
    $substring .= $needle;
    ++$i;
}
while (isset($haystack{$i}) && $haystack{$i} === $needle);

return $substring;

但是,您自己已经有了最有效的实施:

return str_repeat($needle, strspn(strstr($haystack, $needle), $needle));

它在本质上也很实用。

Your implementations are missing error handling, so does my while implementation. Adding it is definitely required in my opinion but I ignore it because you do.


在装有 Win 10 PHP TS x64 7.1 的 i7 机器上的结果:

$ bench 10000
0.0040609836578369  # str_repeat
0.0044500827789307  # preg_match
0.0046060085296631  # while
0.0050818920135498  # for
0.0052239894866943  # preg_match + preg_quote
0.0079050064086914  # explode

#!/usr/bin/env php
<?php

function bench(callable $cb): void {
    global $argv;

    $limit = 1000;
    if (isset($argv[1]) && is_numeric($argv[1])) {
        $limit = (int) $argv[1];
    }
    elseif (isset($_ENV['LOOP']) && is_numeric($_ENV['LOOP'])) {
        $limit = (int) $_ENV['LOOP'];
    }

    gc_collect_cycles();
    gc_disable();
    $start = microtime(true);
    for ($i = 0; $i < $limit; ++$i) {
        $cb();
    }
    $end = microtime(true);
    gc_enable();
    gc_collect_cycles();

    echo $end - $start, "\n";
}

$haystack = 'fhdfhbc::::dcdcdcuttr482rdvcjv:ducvdk:::chjvdbj';
$needle   = ':';

bench(function () use ($haystack, $needle) {
    return str_repeat($needle, strspn(strstr($haystack, $needle), $needle));
});

bench(function () use ($haystack, $needle) {
    preg_match("/{$needle}{2,}/", $haystack, $matches);

    return $matches[0] ?? '';
});

bench(function () use ($haystack, $needle) {
    $substring = '';
    $i         = strpos($haystack, $needle);

    do {
        $substring .= $needle;
        ++$i;
    }
    while (isset($haystack{$i}) && $haystack{$i} === $needle);

    return $substring;
});

bench(function () use ($haystack, $needle) {
    $substring = '';

    for ($i = strpos($haystack, $needle); isset($haystack{$i}) && $haystack{$i} === $needle; ++$i) {
        $substring .= $needle;
    }

    return $substring;
});

bench(function () use ($haystack, $needle) {
    $needle = preg_quote($needle, '/');

    preg_match("/{$needle}{2,}/", $haystack, $matches);

    return $matches[0] ?? '';
});

bench(function () use ($haystack, $needle) {
    $explode   = explode($needle, $haystack);
    $substring = $needle;
    $empty     = false;

    for ($i = 0, $count = count($explode); $i < $count; $i++) {
        if ($explode[$i]) {
            if ($empty) {
                break;
            }
        }
        else {
            $empty     = true;
            $substring .= $needle;
        }
    }

    return $substring;
});