正则表达式匹配包含一个字符的单词连续两次

Regex matching a word containing a character exactly two times in a row

问题

如标题所述,我的目标是找到一个匹配单词的正则表达式,当且仅当它包含 正好 两个连续字符的子字符串,且未被包围同一个角色。

测试用例

我以前尝试过的东西

正则表达式 [a-zA-Z]*([a-zA-Z])[a-zA-Z]* 匹配具有 至少 两个连续字符的单词,但 belllike 仍会匹配,因为连续字符没有上限.

我也尝试过使用负面前瞻和后视。对于一封信,这可能看起来像这样:

[a-zA-Z]*(?<!a)aa(?!a)[a-zA-Z]*

这个正则表达式满足字母 a 的所有要求,但我和我问过的人都不能将它概括为使用捕获组并因此适用于任何字母(copy-pasting 这个语句 26 次 -每个字母一次 - 将它们与 OR 组合并不是我正在寻找的解决方案,尽管它可能会起作用)。

我在找什么

当然,如果能解决所描述的问题就太好了。如果不能用正则表达式来完成,我会同样高兴地解释为什么这是不可能的。

背景

这项任务是我必须为大学做的作业的一部分。在一次对话中,教授后来表示他们实际上并不想问这个问题,并且可以接受三个或更多相同字符的字符序列。然而,试图为这个问题找到解决方案的努力激发了我对正则表达式是否真的可行以及如何实现的兴趣。

要使用的正则表达式风格

即使最初的任务应该在 Java 8+ 正则表达式风格中完成,我也可以使用任何正则表达式风格的解决方案来解决所描述的问题。

你可以试试:

^(?:.*?(.)(?!))?(.)(?!).*$

看到 demo

  • ^ - 起始行锚点。
  • (?: - 打开非捕获组:
    • .*? - 除换行符外的 0+ 个字符(惰性)最多;
    • (.)(?!) - 除了换行符之外的单个字符的第一个捕获组,但断言它后面没有使用负先行持有对该字符的反向引用的相同字符。
    • )? - 关闭非捕获组并使其可选。
  • (.)(?!) - 与以前相同的构造,不同之处在于这次在第二个捕获组和断言位置的负先行之间有一个反向引用,后面跟着完全相同的字符。
  • .* - 除换行符(贪婪)以外的 0+ 个字符;
  • $ - 结束行锚点。

可视化:

使用

^(.)(?!)|(.?)(?!)(.)(?!)

proof

解释

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
                         what was matched by capture 
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
                           what was matched by capture 
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    .?                       any character except \n (optional
                             (matching the most amount possible))
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
                           what was matched by capture 
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
                         what was matched by capture 
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
                           what was matched by capture 
--------------------------------------------------------------------------------
  )                        end of look-ahead

如果正则表达式支持无限宽度回顾:

(.)(?!)(?<!..)

参见 proof

解释

--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
                         what was matched by capture 
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
                           what was matched by capture 
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
                           what was matched by capture 
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
  )                        end of look-behind