在不使用 preg_* 函数的情况下重建字符串中可能重复字符的第一个实例的更好方法

Question

我正在使用的一个函数需要检测字符的第一个实例，如果字符重复，则重建重复的子字符串。例如：

$x = 'fhdfhbc::::dcdcdcuttr482rdvcjv:ducvdk:::chjvdbj'; // ---> function should extract ::::

我不想使用任何 preg_* 功能，因为我尽可能避免使用这些功能（因为这些功能很慢）。我目前的解决方案是：

$char = ":"; // this would be set as necessary

$char_substring = str_repeat($char, strspn(strstr($x, $char), $char)); // yields ---> ::::

请注意，您不能在此处使用 strrpos，因为（在这种情况下）字符串的另一端可能有冒号。您可以使用 explode，然后运行一个 for 或 foreach 循环，连接空的，或者这个的一些变体：

$explode = explode($char, $x);
$substring = $char; // explode array should have 1 less empty member than the repeated character, so need to start with char count of 1
$emptyEncountered = false;
for($i = 0, $count = count($explode); $i < $count; $i++) {
    if ($explode[$i]) {
        if ($emptyEncountered) break;
    } else {
        $emptyEncountered = true;
        $substring .= $char;
    }
}

echo $substring; // ---> ::::

有没有比使用 preg_*、for/each 循环或 str_repeat(strspn(strstr())) 更好的方法？

Answer 1

适当的 preg_* 实施将优于您的 explode 方法，不仅在时间上而且在内存消耗和所需分配方面也是如此。

我能想到的唯一有效且符合您的约束的实现是 while 循环：

$substring = '';

$i = strpos($haystack, $needle);
do {
    $substring .= $needle;
    ++$i;
}
while (isset($haystack{$i}) && $haystack{$i} === $needle);

return $substring;

但是，您自己已经有了最有效的实施：

return str_repeat($needle, strspn(strstr($haystack, $needle), $needle));

它在本质上也很实用。

Your implementations are missing error handling, so does my while implementation. Adding it is definitely required in my opinion but I ignore it because you do.

在装有 Win 10 PHP TS x64 7.1 的 i7 机器上的结果：

$ bench 10000
0.0040609836578369  # str_repeat
0.0044500827789307  # preg_match
0.0046060085296631  # while
0.0050818920135498  # for
0.0052239894866943  # preg_match + preg_quote
0.0079050064086914  # explode

#!/usr/bin/env php
<?php

function bench(callable $cb): void {
    global $argv;

    $limit = 1000;
    if (isset($argv[1]) && is_numeric($argv[1])) {
        $limit = (int) $argv[1];
    }
    elseif (isset($_ENV['LOOP']) && is_numeric($_ENV['LOOP'])) {
        $limit = (int) $_ENV['LOOP'];
    }

    gc_collect_cycles();
    gc_disable();
    $start = microtime(true);
    for ($i = 0; $i < $limit; ++$i) {
        $cb();
    }
    $end = microtime(true);
    gc_enable();
    gc_collect_cycles();

    echo $end - $start, "\n";
}

$haystack = 'fhdfhbc::::dcdcdcuttr482rdvcjv:ducvdk:::chjvdbj';
$needle   = ':';

bench(function () use ($haystack, $needle) {
    return str_repeat($needle, strspn(strstr($haystack, $needle), $needle));
});

bench(function () use ($haystack, $needle) {
    preg_match("/{$needle}{2,}/", $haystack, $matches);

    return $matches[0] ?? '';
});

bench(function () use ($haystack, $needle) {
    $substring = '';
    $i         = strpos($haystack, $needle);

    do {
        $substring .= $needle;
        ++$i;
    }
    while (isset($haystack{$i}) && $haystack{$i} === $needle);

    return $substring;
});

bench(function () use ($haystack, $needle) {
    $substring = '';

    for ($i = strpos($haystack, $needle); isset($haystack{$i}) && $haystack{$i} === $needle; ++$i) {
        $substring .= $needle;
    }

    return $substring;
});

bench(function () use ($haystack, $needle) {
    $needle = preg_quote($needle, '/');

    preg_match("/{$needle}{2,}/", $haystack, $matches);

    return $matches[0] ?? '';
});

bench(function () use ($haystack, $needle) {
    $explode   = explode($needle, $haystack);
    $substring = $needle;
    $empty     = false;

    for ($i = 0, $count = count($explode); $i < $count; $i++) {
        if ($explode[$i]) {
            if ($empty) {
                break;
            }
        }
        else {
            $empty     = true;
            $substring .= $needle;
        }
    }

    return $substring;
});

在不使用 preg_* 函数的情况下重建字符串中可能重复字符的第一个实例的更好方法

Better way to reconstruct first instance of a possibly-repeated character within a string without using preg_* functions

php

repeat