如何从文本中删除所有字母数字单词?

How to remove all alphanumeric words from the text?

我正在尝试在 PHP 中编写正则表达式,这只会 删除字母数字单词(包含数字的单词),但不会删除具有 [=40 的数字=] 和类似的特殊字符(例如价格、phone 数字等)。

应该删除的词:

1st, H20, 2nd, O2, 3rd, NUMB3RS, Rüthen1, Wrocław2

不应删除的词:

0, 5.5, 10, 0, £65, +44, (20), 123, ext:124, 4.4-BSD,

目前的代码如下:

$text = 'To remove: 1st H20; 2nd O2; 3rd NUMB3RS; To leave: Digits: -2 0 5.5 10, Prices: 0 or £65, Phone: +44 (20) 123 ext:124, 4.4-BSD';
$pattern = '/\b\w*\d\w*\b-?/';
echo $text, preg_replace($pattern, " ", $text);

但是它会删除所有单词,包括数字、价格和 phone。

到目前为止,我还尝试了以下模式:

/(\s+\w{1,2}(?=\W+))|(\s+[a-zA-Z0-9_-]+\d+)/ # Removes digits, etc.
/[^(\w|\d|\'|\"|\.|\!|\?|;|,|\|\/|\-|:|\&|@)]+/ # Doesn't work.
/(\s+\w{1,2}(?=\W+))|(\s+[a-zA-Z0-9_-]+\d+)/ # Removes too much.
/[^\p{L}\p{N}-]+/u                       # It removes only special characters.
/(^[\D]+\s|\s[\D]+\s|\s[\D]+$|^[\D]+$)+/ # Removes words.
/ ?\b[^ ]*[0-9][^ ]*\b/i                 # Almost, but removes digits, price, phone.
/\s+[\w-]*\d[\w-]*|[\w-]*\d[\w-]*\s*/    # Almost, but removes digits, price, phone.
/\b\w*\d\w*\b-?/                         # Almost, but removes digits, price, phone.
/[A-Za-z0-9]*[A-Za-z][A-Za-z0-9]*/       # Almost, but removes too much.

我在 SO(其中大多数通常过于具体)和其他网站上发现,这些网站应该删除带数字的单词,但事实并非如此。

我如何编写一个简单的正则表达式来删除这些单词而不影响其他内容?

示例文本:

To remove: 1st H20; 2nd O2; 3rd NUMB3RS;

To leave: Digits: -2 0 5.5 10, Prices: 0 or £65, Phone: +44 (20) 123 ext:124, 4.4-BSD

预期输出:

To remove: ; ; ; To leave: Digits: -2 0 5.5 10, Prices: 0 or £65, Phone: +44 (20) 123 ext:124, 4.4-BSD

用什么替换 \b(?=[a-z]+\d|[a-z]*\d+[a-z]+)\w*\b\s* 怎么样?

演示:https://regex101.com/r/jA2fW3/1

模式代码:

$pattern = '/\b(?=[a-z]+\d|[a-z]*\d+[a-z]+)\w*\b\s*/i';

要匹配包含 foreign/accented 个字母的字母数字词,请使用以下模式:

$pattern = '/\b(?=[\pL]+\d|[\pL]*\d+[\pL]+)[\pL\w]*\b\s*/i';

演示:https://regex101.com/r/jA2fW3/3

您可以按如下方式修改正则表达式以获得所需的输出。

$text = preg_replace('/\b(?:[a-z]+\d+[a-z]*|\d+[a-z]+)\b/i', '', $text);

要匹配来自任何语言的任何类型的字母,请使用 Unicode 属性 \p{L}:

$text = preg_replace('/\b(?:\pL+\d+\pL*|\d+\pL+)\b/u', '', $text);