正则表达式检查单词和用空格分隔字母的单词

Question

所以我在字符串中检查了一组脏话。

例如

$string = 'naughty string';
$words = [
    'naughty',
    'example',
    'words'
];
$pattern = '/('.join($words, '|').')/i';
preg_match_all($pattern, $string, $matches);
$matched = implode(', ', $matches[0]);

但我还想检查与 spaces:

分开的亵渎行为

例如

n a u g h t y

是的，我可以通过将它添加到数组中来做到这一点：

$words = [
    'naughty',
    'n a u g h t y',
    'example',
    'e x a m p l e',
    'words',
    'w o r d s'
];

但是我有一大堆 "bad" 个单词，想知道是否有任何简单的方法来做到这一点？

------ 编辑 ------

所以这并不意味着非常准确。对于我的应用程序，每个 space 都是一个新行。所以像这样的字符串：n a u g h t y string 将导致：

n

一个

你

克

h

t

y

字符串

Answer 1

要按要求回答问题，请创建类似 b\s*a\s*d 的模式，而不仅仅是 bad:

$string = 'some bad and b a d and more ugly and very u g l y words';

$words = [
    'bad',
    'ugly'
];

$pattern = '/\b(' . join(
    array_map(function($w) {
        return join(str_split($w), '\s*');
    }, $words), '|') .'\b)/i';

print preg_replace($pattern, '***', $string); 
// some *** and *** and more *** and very *** words

更一般地说，您无法可靠地删除脏话，尤其是在 unicode 世界中。您无法过滤掉 ƒⓤçκ.

之类的内容

Answer 2

用\s?将单词编码到数组中以匹配可选空格，如下所示：

$words = [
    'n\s?a\s?u\s?g\s?h\s?t\s?y',
    'e\s?x\s?a\s?m\s?p\s?l\s?e',
    'w\s?o\s?r\s?d\s?s',
];

或者您可以使用 \s* 来匹配任意数量的空格。

如果您不熟悉正则表达式的细微差别，我建议您看一下 https://regex101.com/

正则表达式检查单词和用空格分隔字母的单词

Regex check for words and words with spaces separating letters

php

regex

arrays

profanity

preg-match-all