正则表达式 php 从不同组中的字符串中分离出一个确切的词

Question

我已经尝试了我所知道的一切，但仍然无法弄清楚如何解决这个问题：

我有一个字符串 ex :

"--included-- in selling price: 5 % vat usd 10.00 packaging fees 2 % notifying fees"
"--not included-- in selling price: us$ 35.00 express fees 2 % notifying fees"

我想知道税是 "included" 还是 "excluded" 以及费用是“%”还是 "currency" 问题是当它附加到税名 "vat usd"

时它没有检测到货币 "usd"

如何将不同组中的货币与税名分开。

这是我做的

(--excluded--|--included--|--not included--)([a-z ]*)?:?(usd | aed | mad | € | us$ )?([ \. 0-9 ]*)(%)?([a-z A-z ?]*) (aed|mad|€|us$)*((aed|mad|€|us$)+)?([\. 0-9 ]*)(%)?([a-z A-z]*)(.*)?

这就是我得到的

Match 1
Full match  0-83    --included-- in selling price: 5 % vat usd 10.00 packaging fees 2 % notifying fees

Group 1.    0-12    --included--

Group 2.    12-29    in selling price

Group 4.    30-33    5 

Group 5.    33-34   %

Group 6.    34-42    vat usd

Group 10.   43-49   10.00 

Group 12.   49-64   packaging fees 

Group 13.   64-82   2 % notifying fees

这就是我想要的

Match 1
Full match  0-83    --included-- in selling price: 5 % vat usd 10.00 packaging fees 2 % notifying fees

Group 1.    0-12    --included--

Group 2.    12-29    in selling price

Group 4.    30-33    5 

Group 5.    33-34   %

Group 6.    34-38    vat

Group 7.    38-42    usd

Group 10.   43-49   10.00 

Group 12.   49-64   packaging fees 

Group 13.   64-82   2 % notifying fees

Answer 1

解决方法如下：

$s = "--included-- in product price: breakfast --excluded--: 5 % vat aed 10.00 destination fee per night 2 % municipality fee 3.5 % packaging fee 10 % warranty service charge";
$results = [];
if (preg_match_all('~(--(?:(?:not )?in|ex)cluded--)(?:\s+([a-zA-Z ]+))?:+\s*((?:(?!--(?:(?:not )?in|ex)cluded--).)*)~su', $s, $m, PREG_SET_ORDER, 0)) {
    foreach ($m as $v) {
        $lastline=array_pop($v); // Remove last item //print_r($details);
        if (preg_match_all('~(?:(\b(?:usd|aed|mad|usd)\b|\B€|\bus$)\s*)?\d+(?:\.\d+)?(?:(?!(?1))\D)*~ui', $lastline, $details)) {
            $results[] = array_merge($v, $details[0]);
        } else {
            $results[] = $v;
        }
    }
}
print_r($results);

参见PHP demo。

备注:

第一个正则表达式提取您需要解析的每个匹配项。参见 the first regex demo。意思是：

(--(?:(?:not )?in|ex)cluded--) - 第 1 组：(--excluded--|--included--|--not included--) 的较短版本：--excluded--、--included-- 或 --not included--
(?:\s+([a-zA-Z ]+))? - 可选序列：1+ 个空格，然后第 2 组：1+ 个 ASCII 字母或空格
:+ - 1 个或多个冒号
\s* - 0+ 个空格
((?:(?!--(?:(?:not )?in|ex)cluded--).)*) - 第 3 组：任何字符，出现 0+ 次，尽可能多，不开始以下三个字符序列中的任何一个：--excluded--、--included--、--not included--

然后，需要进一步解析第 3 组值以获取所有详细信息。这里用The second regex来匹配

(?:(\b(?:usd|aed|mad|usd)\b|\B€|\bus$)\s*)? - 可选出现
- (\b(?:usd|aed|mad|usd)\b|\B€|\bus$) - 第 1 组：
  - \b(?:usd|aed|mad|usd)\b - usd、aed、mad 或 usd 作为整个单词
  - \B€ - € 前面没有单词 char
  - \bus$ - us$ 前面没有单词 char
- \s* - 0+ 个空格
\d+
(?:\.\d+)? - . 和 1+ 个数字的可选序列
(?:(?!(?1))\D)* - 任何非数字字符，出现 0 次或多次，尽可能多，不以与第 1 组相同的模式开始

正则表达式 php 从不同组中的字符串中分离出一个确切的词

regex php separate an exact word from string in diffrent groups

php

regex

word