通过正则表达式清理逗号分隔的列表

Question

我想清理以逗号分隔的标签列表，以删除空标签和多余空格。我想到了

$str='first , second ,, third, ,fourth   suffix';
echo preg_replace('#[,]{2,}#',',',preg_replace('#\s*,+\s*#',',',preg_replace('#\s+#s',' ',$str)));

到目前为止哪个效果很好，但是是否可以一次替换完成？

Answer 1

你可以使用

preg_replace('~\s*(?:(,)\s*)+|(\s)+~', '', $str)

将两个备选方案合二为一导致

preg_replace('~\s*(?:([,\s])\s*)+~', '', $str)

见regex demo and the PHP demo。详情:

\s*(?:(,)\s*)+ - 零个或多个空格，然后出现一个或多个逗号（捕获到组 1 (</code>)），然后是零个或多个空格</li> <li><code>| - 或
(\s)+ - 一个或多个空格，同时将最后一个空格捕获到第 2 组 (</code>)。</li> </ul> <p>在第二个正则表达式中，<code>([,\s]) 捕获单个逗号或空白字符。

第二个正则表达式匹配：
- \s* - 零个或多个空格
- (?:([,\s])\s*)+ - 出现一次或多次
  - ([,\s]) - 第 1 组 (</code>)：逗号或空格</li> <li><code>\s* - 零个或多个空格
参见 PHP demo:
```
<?php
 
$str='first , second ,, third, ,fourth   suffix';
echo preg_replace('~\s*(?:(,)\s*)+|(\s)+~', '', $str) . PHP_EOL;
echo preg_replace('~\s*(?:([,\s])\s*)+~', '', $str);
// => first,second,third,fourth suffix
//    first,second,third,fourth suffix
```
奖金

此解决方案适用于所有 NFA 正则表达式风格，这里是 JavaScript 演示：
```
const str = 'first , second ,, third, ,fourth   suffix';
console.log(str.replace(/\s*(?:(,)\s*)+|(\s)+/g, ''));
console.log(str.replace(/\s*(?:([,\s])\s*)+/g, ''));
```
甚至可以调整它以用于 POSIX 工具，例如 sed:
```
sed -E 's/[[:space:]]*(([,[:space:]])[[:space:]]*)+//g' file > outputfile
```
参见online demo。

Answer 2

您可以使用：

[\h*([,\h])[,\h]*

在线查看demo。或者：

\h*([,\h])(?1)*

在线查看demo

\h* - 0+（贪婪）水平空白字符；
([,\h]) - 匹配逗号或水平空白的第一个捕获组；
[,\h]* - 选项 1：0+（贪心）逗号或水平空白字符；
(?1)* - 选项 2：将第一个子模式递归 0+（贪心）次。

替换为第一个捕获组：

$str='first , second ,, third, ,fourth   suffix';
echo preg_replace('~\h*([,\h])[,\h]*~', '', $str);
echo preg_replace('~\h*([,\h])(?1)*~', '', $str);

同时打印：

first,second,third,fourth suffix

通过正则表达式清理逗号分隔的列表

Clean up a comma-separated list by regex

php

regex

preg-replace