如何使用 PHP 将字符串分隔为单个单词中的数字？

Question

我有 AK747 这个词，我使用正则表达式来检测一个字符串（至少 2 个字符，例如：AK）是否后跟一个数字（至少到数字，例如：747）。编辑：（抱歉，我不清楚这些人）我需要在上面执行此操作，因为 :

在某些情况下，我需要拆分以匹配针对 AK-747 的搜索。当我用关键字 'AK747' 搜索字符串 'AK-747' 时，除非我在数据库中使用 levenshtein，否则它找不到匹配项，所以我更喜欢将 AK747 拆分为 AK 和 747。

我的代码：

$strNumMatch = preg_match('/^[a-zA-Z]{2,}[0-9]{2,}$/', 
$value, $match);

if(isset($match[0]))
    echo $match[0];

如何使用 preg_split() 或任何其他方式拆分为数组 ['AK', '747']？

Answer 1

你可以试试这个：

preg_match('/[0-9]{2,}/', $value, $matches, PREG_OFFSET_CAPTURE);
$position = $matches[0][1];
$letters = substr($value, 0, $position);
$numbers = substr($value, $position);

这样你就可以得到第一个数字的位置并在那里拆分。

编辑：从您最初的方法开始，这可能看起来像这样：

$strNumMatch = preg_match('/^([a-zA-Z]{2,})([0-9]{2,})$/', $value, $match, PREG_OFFSET_CAPTURE);
if($strNumMatch){
    $position = $matches[2][1];
    $letters = substr($value, 0, $position);
    $numbers = substr($value, $position);
    $alternative = $letters.'-'.$numbers;
}

Answer 2

您可以使用 ? 使 - 可选。

/([A-Za-z]{2,}-?[0-9]{2,})/

https://regex101.com/r/tIgM4F/1

Answer 3

$input = 'AK-747';

if (preg_match('/^([a-z]{2,})-?([0-9]{2,})$/i', $input, $result)) {
    unset($result[0]);
}

print_r($result);

输出：

Array
(
    [1] => AK
    [2] => 747
)

Answer 4

preg_split() 是一个非常明智和直接的调用，因为您需要一个包含两个子字符串的索引数组。

代码：(Demo)

$input = 'AK-747';
var_export(preg_split('/[a-z]{2,}\K-?/i',$input));

输出：

array (
  0 => 'AK',
  1 => '747',
)

\K表示"restart the fullstring match"。实际上，\K 左侧的所有内容都保留为结果数组中的第一个元素，右侧的所有内容（可选的连字符）都被省略，因为它被视为分隔符。 Pattern Demo

代码：(Demo)

我处理一小部分输入以显示可以做什么并在代码片段后进行解释。

$inputs=['AK747','AK-747','AK-','AK'];  // variations as I understand them
foreach($inputs as $input){
    echo "$input returns: ";
    var_export(preg_split('/[a-z]{2,}\K-?/i',$input,2,PREG_SPLIT_NO_EMPTY));
    echo "\n";
}

输出：

AK747 returns: array (
  0 => 'AK',
  1 => '747',
)
AK-747 returns: array (
  0 => 'AK',
  1 => '747',
)
AK- returns: array (
  0 => 'AK',
)
AK returns: array (
  0 => 'AK',
)

preg_split() 采用一种模式，该模式接收将匹配可变子字符串的模式并将其用作定界符。如果 - 出现在每个输入字符串中，那么 explode('-',$input) 将是最合适的。但是，- 在此任务中是可选的，因此模式必须允许 - 是可选的（这是 ? 量词在本页所有模式中所做的）。

现在，您不能只使用 /-?/、that would split the string on every character 这样的模式。为了克服这个问题，您需要告诉正则表达式引擎可选 - 的确切预期位置。您可以通过在 -?（单个预期分隔符）之前引用 [a-z]{2,} 来执行此操作。

模式 /[a-z]{2,}-?/i 可以很好地找到可选连字符的正确位置，但现在的问题是，字符串中的前导字母包含为 part of the delimiting substring.

有时，"lookarounds" 可用于正则表达式模式以匹配但不消耗子字符串。 "positive lookbehind" 用于匹配前面的子字符串，但是 "variable length lookbehinds" are not permitted in php (and most other regex flavors)。这就是无效模式的样子：/(?<=[a-z]{2,})-?/i.

解决此技术问题的方法是 "restart the fullstring match" 在可选连字符之前使用 \K token (aka a lookbehind alternative)。要仅正确定位预期的分隔符，前导字母必须是 "matched/consumed" 然后是 "discarded" —— 这就是 \K 所做的。

至于包含preg_split()的第3个和第4个参数...

我已将第 3 个参数设置为 2。这就像 explode() 具有的 limit 参数一样。它指示函数不要生成超过 2 个输出元素。对于这种情况，我可以使用 NULL 或 -1 来表示 "unlimited"，但我不能将参数留空——必须分配它以允许声明第 4 个参数.
我已将第 4 个参数设置为 PREG_SPLIT_NO_EMPTY，指示函数不生成空输出元素。

哒哒哒！

p.s。 a preg_match_all() solution 就像使用管道和两个锚一样简单：

$inputs=['AK747','AK-747','AK-','AK'];  // variations as I understand them
foreach($inputs as $input){
    echo "$input returns: ";
    var_export(preg_match_all('/^[a-z]{2,}|\d{2,}$/i',$input,$out)?$out[0]:[]);
    echo "\n";
}
// same outputs as above

如何使用 PHP 将字符串分隔为单个单词中的数字？

How to separate string to number in single word with PHP?

php

regex

preg-split