正则表达式谜语。想要的结果:'onetwothree'、'onetwo'、'twothree' 但不是 'two'。也许是积极的前瞻?用于货币提取

regex riddle. want results: 'onetwothree', 'onetwo', 'twothree' but NOT 'two'. Positive lookahead maybe?. For currency extraction

混淆了一些基本的正则表达式逻辑。使用简单示例:

(one)(two)(three)

我想要正则表达式捕捉:

onetwothree
onetwo
   twothree

但不是

   two

分组抓(一)(二)(三).

我知道我可以在 'two' 上使用正向前瞻,这样它前面只有 'one':

(one)?((?<=one)two)(three)? 

但是我无法得到 'twothree' 结果

现实世界需要的是货币:

group one:   [$¥£€₹]
group two:   ((?:\d{1,10}[,. ])*\d{1,10})
group three: ( ?(?:[$¥£€₹]|AUD|USD|GBP|EURO?S?\b))

所以我想得到这些结果:

,000 AUD
,000
 20,000 AUD

但不是

 20,000

感谢帮助!

PS需要分组匹配(一)(二)(三)或(一)(二)或(二)(三)。

老实说,我会丢弃任何 lookaheads/lookbehinds 并单独定义所有情况,然后将它们组合起来。它更可靠,更容易推理和理解,也更有效。

也一样

(^(groupone)(grouptwo)$)|(^(groupone)(grouptwo)(groupthree)$)|(^(grouptwo)(groupthree)$)

例如:

$groupone    = '[$¥£€₹]';
$grouptwo    = '(?:\d{1,10}[,. ])*\d{1,10}';
$groupthree  = ' ?([$¥£€₹]|AUD|USD|GBP|EURO)';
$caseone     = "^($groupone)($grouptwo)$";
$casetwo     = "^($groupone)($grouptwo)($groupthree)$";
$casethree   = "^($grouptwo)($groupthree)$";
$allcases    = "/($caseone)|($casetwo)|($casethree)/";

preg_match($allcases, '20,000 AUD', $matches);
print_r($matches); // matches, preg_match returns 1

preg_match($allcases, ',000', $matches);
print_r($matches); // matches, preg_match returns 1

preg_match($allcases, ',000 AUD', $matches);
print_r($matches); // matches, preg_match returns 1

preg_match($allcases, '20,000', $matches);
print_r($matches); // empty, preg_match returns 0

为了使结果看起来更好(跳过空结果、重复项、多余的空格等),我还使用了清理功能:

<?php
$groupone    = '[$¥£€₹]';
$grouptwo    = '(?:\d{1,10}[,. ])*\d{1,10}';
$groupthree  = ' ?([$¥£€₹]|AUD|USD|GBP|EURO)';
$caseone     = "^($groupone)($grouptwo)$";
$casetwo     = "^($grouptwo)($groupthree)$";
$casethree   = "^($groupone)($grouptwo)($groupthree)$";
$allcases    = "/($caseone)|($casetwo)|($casethree)/";

function cleanup($arr) {
  # trim trailing, ending whitespace
  $newarr = array_map('trim', $arr);
  # remove empty values
  $newarr = array_filter($newarr, function($value) { return $value !== ''; });
  # remove duplicates
  $newarr = array_unique($newarr);
  # we're only interested about the values
  return array_values($newarr);
}

preg_match($allcases, '20,000 AUD', $matches);
print_r(cleanup($matches));

preg_match($allcases, ',000', $matches);
print_r(cleanup($matches));

preg_match($allcases, ',000 AUD', $matches);
print_r(cleanup($matches));

preg_match($allcases, '20,000', $matches);
print_r(cleanup($matches));

这会让你得到像

这样的结果
Array
(
    [0] => 20,000 AUD
    [1] => 20,000
    [2] => AUD
)
Array
(
    [0] => ,000
    [1] => $
    [2] => 20,000
)
Array
(
    [0] => ,000 AUD
    [1] => $
    [2] => 20,000
    [3] => AUD
)
Array
(
)

编辑:如果您希望组相同,您可以使用像

这样的命名组
$groupone    = '(?<currencyprefix>[$¥£€₹])';
$grouptwo    = '((?:\d{1,10}[,. ])*\d{1,10})';
$groupthree  = ' ?(?<currencypostfix>([$¥£€₹]|AUD|USD|GBP|EURO))';
$caseone     = "^$groupone$grouptwo$";
$casetwo     = "^$grouptwo$groupthree$";
$casethree   = "^$groupone$grouptwo$groupthree$";
$allcases    = "/(?J)($caseone)|($casetwo)|($casethree)/";

function cleanup($arr) {
  $currencyprefix = isset($arr['currencyprefix']) ? $arr['currencyprefix'] : null;
  $currencypostfix = isset($arr['currencypostfix']) ? $arr['currencypostfix'] :null;

  return array($currencyprefix, $currencypostfix);
}

if (preg_match($allcases, '20,000 AUD', $matches))
  print_r(cleanup($matches));

if (preg_match($allcases, ',000', $matches))
  print_r(cleanup($matches));

if (preg_match($allcases, ',000 AUD', $matches))
  print_r(cleanup($matches));

if (preg_match($allcases, '20,000', $matches))
  print_r(cleanup($matches));

哪个会让你

Array
(
    [0] =>
    [1] => AUD
)
Array
(
    [0] => $
    [1] =>
)
Array
(
    [0] => $
    [1] => AUD
)

或者,也可以在最终结果中使用命名键:

$groupone    = '(?<currencyprefix>[$¥£€₹])';
$grouptwo    = '((?:\d{1,10}[,. ])*\d{1,10})';
$groupthree  = ' ?(?<currencypostfix>([$¥£€₹]|AUD|USD|GBP|EURO))';
$caseone     = "^$groupone$grouptwo$";
$casetwo     = "^$grouptwo$groupthree$";
$casethree   = "^$groupone$grouptwo$groupthree$";
$allcases    = "/(?J)($caseone)|($casetwo)|($casethree)/";

function cleanup($arr) {
  $newarr = array_filter($arr, function($var){ return !empty($var); });
  return array_filter($newarr, "is_string", ARRAY_FILTER_USE_KEY);
}

if (preg_match($allcases, '20,000 AUD', $matches))
  print_r(cleanup($matches));

if (preg_match($allcases, ',000', $matches))
  print_r(cleanup($matches));

if (preg_match($allcases, ',000 AUD', $matches))
  print_r(cleanup($matches));

if (preg_match($allcases, '20,000', $matches))
  print_r(cleanup($matches));

结果:

Array
(
    [currencypostfix] => AUD
)
Array
(
    [currencyprefix] => $
)
Array
(
    [currencyprefix] => $
    [currencypostfix] => AUD
)

这是一个纯正则表达式响应:

(ONE)?TWO(?(1)(THREE)?|THREE)

使用条件语句,您可以检查第一组是否匹配,如果匹配,则可以强制最后一组。

(ONE)?TWO(?(1)(THREE)?|THREE)
^     ^  ^^^^^   ^       ^
1     2    3     4       5

1: Try to match ONE. If you can't find it, no big deal.
2: You absolutely must have TWO.
3: If the first group DID match (ONE), then...
4: ... Use the first result
5: Otherwise use the second result

有了这个,我们只是让第一个结果是可选的,所以如果我们匹配一个,那么三个是可选的。如果我们错过了一个,那么三个是强制性的。

ONE
TWO
THREE
ONETWO       // Matches! (e.g: ,000)
ONETHREE
TWOTHREE     // Matches! (e.g: 20,000 AUD)
ONETWOTHREE  // Matches! (e.g: ,000 AUD)

Try it online!

Read more about conditional regex patterns in PHP here.

您可以使用可选组的交替。

\bonetwo(?:three)?|twothree\b

Regex demo

具有命名捕获组和 J 标志以允许重复子模式名称的示例:

(?P<symbol>[$¥£€₹])(?P<amount>\d{1,10}(?:[.,]\d{1,10})?)(?:\h+(?P<currency>AUD|USD|GBP|EURO?S?)\b)?\b|(?P<amount>\d{1,10}(?:[.,]\d{1,10})?)\h+(?P<currency>AUD|USD|GBP|EURO?S?)\b

Regex demo | Php demo

$strings = [
    ",000 AUD",
    ",000",
    "20,000 AUD",
    "$",
    "20,000",
    "AUD"
];
$re = '/(?P<symbol>[$¥£€₹])(?P<amount>\d{1,10}(?:[.,]\d{1,10})?)(?:\h+(?P<currency>AUD|USD|GBP|EURO?S?)\b)?\b|(?P<amount>\d{1,10}(?:[.,]\d{1,10})?)\h+(?P<currency>AUD|USD|GBP|EURO?S?)\b/J';
foreach ($strings as $s) {
    $m = preg_match($re, $s, $matches);
    if ($m) {
        print_r($matches);
    }
}

输出

Array
(
    [0] => ,000 AUD
    [symbol] => $
    [1] => $
    [amount] => 20,000
    [2] => 20,000
    [currency] => AUD
    [3] => AUD
)
Array
(
    [0] => ,000
    [symbol] => $
    [1] => $
    [amount] => 20,000
    [2] => 20,000
)
Array
(
    [0] => 20,000 AUD
    [symbol] => 
    [1] => 
    [amount] => 20,000
    [2] => 
    [currency] => AUD
    [3] => 
    [4] => 20,000
    [5] => AUD
)

see an example without the numerical keys将给出

Array
(
    [symbol] => $
    [amount] => 20,000
    [currency] => AUD
)
Array
(
    [symbol] => $
    [amount] => 20,000
)
Array
(
    [symbol] => 
    [amount] => 20,000
    [currency] => AUD
)