正则表达式匹配 Telegram 用户名并删除 PHP 中的整行
Regex match Telegram username and delete whole line in PHP
我想匹配消息文本中的 Telegram 用户名并删除整行,我试过这种模式,但问题是它也匹配电子邮件:
.*(@(?=.{5,64}(?:\s|$))(?![_])(?!.*[_]{2})[a-zA-Z0-9_]+(?<![_.])).*
模式应匹配所有这些行:
嗨@username你好吗?
你好@username.how是吗?
@用户名。
并且不应像这样匹配电子邮件:
嗨,给某事发电子邮件@domain.com
.*[\W](@(?=.{5,64}(?:\s|$))(?![_])(?!.*[_]{2})[a-zA-Z0-9_]+(?<![_.])).*
我在 @ 符号前添加了 [\W]
non-word 个字符。
在这里你可以查看结果 https://regex101.com/r/yFGegO/1
使用
.*\B@(?=\w{5,32}\b)[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)*.*
\B
before @
表示 @
.
之前必须有一个 non-word 字符或字符串开头
解释
NODE EXPLANATION
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\B the boundary between two word chars (\w)
or two non-word chars (\W)
--------------------------------------------------------------------------------
@ '@'
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\w{5,32} word characters (a-z, A-Z, 0-9, _)
(between 5 and 32 times (matching the
most amount possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[a-zA-Z0-9]+ any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9' (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
_ '_'
--------------------------------------------------------------------------------
[a-zA-Z0-9]+ any character of: 'a' to 'z', 'A' to
'Z', '0' to '9' (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
太阳底下无新鲜事,但基本上其他花样可以归结为:
.*?\B@\w{5}.*
或最终:
.*?\B\w{5,64}\b.*
如果你想更精确,但真的需要吗?
注意:如果您也想删除换行序列,请在模式末尾添加 \R?
。
我想匹配消息文本中的 Telegram 用户名并删除整行,我试过这种模式,但问题是它也匹配电子邮件:
.*(@(?=.{5,64}(?:\s|$))(?![_])(?!.*[_]{2})[a-zA-Z0-9_]+(?<![_.])).*
模式应匹配所有这些行:
嗨@username你好吗?
你好@username.how是吗?
@用户名。
并且不应像这样匹配电子邮件:
嗨,给某事发电子邮件@domain.com
.*[\W](@(?=.{5,64}(?:\s|$))(?![_])(?!.*[_]{2})[a-zA-Z0-9_]+(?<![_.])).*
我在 @ 符号前添加了 [\W]
non-word 个字符。
在这里你可以查看结果 https://regex101.com/r/yFGegO/1
使用
.*\B@(?=\w{5,32}\b)[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)*.*
\B
before @
表示 @
.
解释
NODE EXPLANATION
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\B the boundary between two word chars (\w)
or two non-word chars (\W)
--------------------------------------------------------------------------------
@ '@'
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\w{5,32} word characters (a-z, A-Z, 0-9, _)
(between 5 and 32 times (matching the
most amount possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[a-zA-Z0-9]+ any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9' (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
_ '_'
--------------------------------------------------------------------------------
[a-zA-Z0-9]+ any character of: 'a' to 'z', 'A' to
'Z', '0' to '9' (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
太阳底下无新鲜事,但基本上其他花样可以归结为:
.*?\B@\w{5}.*
或最终:
.*?\B\w{5,64}\b.*
如果你想更精确,但真的需要吗?
注意:如果您也想删除换行序列,请在模式末尾添加 \R?
。