REGEX

Question

如何匹配任何重复 n 次的字符？

示例：

for input: abcdbcdcdd
for n=1:   ..........
for n=2:    .........
for n=3:     .. .....
for n=4:      .  . ..
for n=5:   no matches

几个小时后我最好的就是这个表情

(\w)(?=(?:.*){n-1,}) //where n is variable

使用前瞻。但是这个表达式的问题是：

for input: abcdbcdcdd
for n=1    .......... 
for n=2     ... .. .
for n=3      ..  .
for n=4       .
for n=5    no matches

如您所见，当前瞻匹配字符时，让我们看一下 for n=4 行，d 的前瞻断言得到满足并且第一个 d 与正则表达式匹配。但是剩余的 d 不匹配，因为它们前面没有 3 个 d。

希望我把问题说清楚了。期待您的解决方案，提前致谢。

Answer 1

我不会为此使用正则表达式。我会使用脚本语言，例如 python。试试这个 python 函数：

alpha = 'abcdefghijklmnopqrstuvwxyz'
def get_matched_chars(n, s):
    s = s.lower()
    return [char for char in alpha if s.count(char) == n]

函数会return一个字符列表，所有字符在字符串s中正好出现n次。请记住，我只在字母表中包含了字母。您可以更改 alpha 以表示您想要匹配的任何内容。

Answer 2

正则表达式（和有穷自动机）无法计数为任意整数。他们只能数到一个预定义的整数，幸运的是这就是你的情况。

如果我们先构造一个非确定性有限自动机 (NFA) 广告，然后将其转换为正则表达式，那么解决这个问题就会容易得多。

因此，对于 n=2 和输入字母表 = {a,b,c,d} 的以下自动机

将匹配任何字符恰好重复 2 次的任何字符串。如果没有字符有 2 次重复（所有字符出现少于或多于两次），则字符串将不匹配。

将其转换为正则表达式应该类似于

"^([^a]*a[^a]*a[^a]*)|([^b]*b[^b]*b[^b]*)|([^b]*c[^c]*c[^C]*)|([^d]*d[^d]*d[^d]*)$"

如果输入的字母表很大，这可能会出现问题，因此应该以某种方式缩短正则表达式，但我现在想不出来。

Answer 3

let's look for n=4 line, d's lookahead assertion satisfied and first d matched by regex. But remaining d's are not matched because they don't have 3 more d's ahead of them.

And obviously, without regex, this is a very simple string manipulation problem. I'm trying to do this with and only with regex.

与任何正则表达式实现一样，答案取决于正则表达式风格。您可以使用 .net 正则表达式引擎创建解决方案，因为它允许可变宽度后视。

此外，我将在下面为 perl-compatible/like 正则表达式风格提供更通用的解决方案。

.net 解决方案

与一样，使用可变宽度后视，我们可以断言回到字符串的开头，并检查有 n 次出现。
ideone demo

Python
中的正则表达式模块您可以在 python, using the regex module by Matthew Barnett 中实现此解决方案，它还允许可变宽度的回顾。

>>> import regex
>>> regex.findall( r'(\w)(?<=(?=(?>.*?){2})\A.*)', 'abcdbcdcdd')
['b', 'c', 'd', 'b', 'c', 'd', 'c', 'd', 'd']
>>> regex.findall( r'(\w)(?<=(?=(?>.*?){3})\A.*)', 'abcdbcdcdd')
['c', 'd', 'c', 'd', 'c', 'd', 'd']
>>> regex.findall( r'(\w)(?<=(?=(?>.*?){4})\A.*)', 'abcdbcdcdd')
['d', 'd', 'd', 'd']
>>> regex.findall( r'(\w)(?<=(?=(?>.*?){5})\A.*)', 'abcdbcdcdd')
[]

广义解

在 pcre 或任何 "perl-like" 风格中，没有解决方案实际上 return 每个重复字符匹配 , 但我们可以为每个角色创建一个，而且只能创建一个 capture。

策略

对于任何给定的 n，逻辑涉及：

早期匹配：匹配并捕获每个字符后跟至少 n次出现.
最终捕获：
- 匹配并捕获一个字符，后跟正好 n-1 次，并且
- 还捕获以下每一个事件。

例子

for n = 3
input = abcdbcdcdd

字符c只被M匹配了一次（作为最终），接下来的2次出现也是C在同一场比赛中拍摄：

abcdbcdcdd
  M  C C

并且角色 d 是（早期）M被观看了一次：

abcdbcdcdd
   M

和（最后）M又看了一次，C正在拍剩下的：

abcdbcdcdd
      M CC

正则表达式

/(\w)                        # match 1 character
(?:
    (?=(?:.*?){≪N≫})     # [1] followed by other ≪N≫ occurrences
  |                          #   OR
    (?=                      # [2] followed by:
        (?:(?!).)*()     #      2nd occurence <captured>
        (?:(?!).)*()     #      3rd occurence <captured>
        ≪repeat previous≫  #      repeat subpattern (n-1) times
                             #     *exactly (n-1) times*
        (?!.*?)            #     not followed by another occurence
    )
)/xg

对于n =

/(\w)(?:(?=(?:.*?){2})|(?=(?:(?!).)*()(?!.*?)))/g
demo
/(\w)(?:(?=(?:.*?){3})|(?=(?:(?!).)*()(?:(?!).)*()(?!.*?)))/g
demo
/(\w)(?:(?=(?:.*?){4})|(?=(?:(?!).)*()(?:(?!).)*()(?:(?!).)*()(?!.*?)))/g
demo
...等等

生成模式的伪代码

// Variables: N (int)

character = "(\w)"
early_match = "(?=(?:.*?){" + N + "})"

final_match = "(?="
for i = 1; i < N; i++
    final_match += "(?:(?!).)*()"
final_match += "(?!.*?))"

pattern = character + "(?:" + early_match + "|" + final_match + ")"

JavaScript代码

我将展示一个使用 javascript 的实现，因为我们可以在这里检查结果（如果它在 javascript 中有效，它就有效在任何兼容 perl 的正则表达式中，包括 .net、java、python、ruby、perl，以及所有实现了 pcre 的语言。

var str = 'abcdbcdcdd';
var pattern, re, match, N, i;
var output = "";

// We'll show the results for N = 2, 3 and 4
for (N = 2; N <= 4; N++) {
    // Generate pattern
    pattern = "(\w)(?:(?=(?:.*?\1){" + N + "})|(?=";
    for (i = 1; i < N; i++) {
        pattern += "(?:(?!\1).)*(\1)";
    }
    pattern += "(?!.*?\1)))";
    re = new RegExp(pattern, "g");
    
    output += "<h3>N = " + N + "</h3><pre>Pattern: " + pattern + "\nText: " + str;
    
    // Loop all matches
    while ((match = re.exec(str)) !== null) {
        output += "\nPos: " + match.index + "\tMatch:";
        // Loop all captures
        x = 1;
        while (match[x] != null) {
            output += " " + match[x];
            x++;
        }
    }
    
    output += "</pre>";
}

document.write(output);

Python3码

根据 OP 的要求，我链接到 Python3 implementation in ideone.com

Answer 4

使用 .NET 正则表达式，您可以执行以下操作：

(\w)(?<=(?=(?:.*){n})^.*) where n is variable

其中：

(\w) — 任何角色，在第一组中捕获。
(?<=^.*) — 回顾断言，return 我们到字符串的开头。
(?=(?:.*){n}) — 前瞻断言，查看字符串是否有 n 个该字符的实例。

Demo

REGEX - 匹配任何重复 n 次的字符

REGEX - Matching any character which repeats n times

repeat

lookahead

.net 解决方案

广义解

策略

例子

正则表达式

生成模式的伪代码

JavaScript代码

Python3码