正则表达式超时

Regex timing out

我正在尝试匹配

foo: anything after the colon can be matched with (.*)+
foo.bar1.BAZ: balh5317{}({}(

这是我正在使用的正则表达式:

/^((?:(?:(?:[A-Za-z_]+)(?:[0-9]+)?)+[\.]?)+)(?:\s)?(?:\:)(?:\s)?((?:.*)+)$/

请原谅不匹配的组和额外的括号,这是从构建器编译的class

这适用于示例。当我尝试输入这样的字符串时出现问题:

foo.bar.baz.beef.stew.ect.and.forward

我需要能够像这样检查字符串,但是正则表达式引擎在每次 foo. 一定数量后超时或运行无限(据我所知)。

我确定这是一个我可以解决的逻辑问题,但不幸的是我还远未掌握正则表达式,我希望更有经验的用户可以阐明我如何使它更有效率。

另外,我需要匹配的内容更详细的描述如下:

Property Name: can contain A-z, numbers, and underscores but can't start with a number

<Property Name>.<Property Name>.<Prop...:<Anything after the colon>

感谢您的宝贵时间!

从您的正则表达式开始:

^((?:(?:(?:[A-Za-z_]+)(?:[0-9]+)?)+[\.]?)+)(?:\s)?(?:\:)(?:\s)?((?:.*)+)$


 ^                                  # Anchors to the beginning to the string.
 (                                  # Opens CG1
     (?:                            # Opens NCG
         (?:                        # Opens NCG
             (?:                    # Opens NCG
                 [A-Za-z_]+         # Character class (any of the characters within)
             )                      # Closes NCG
             (?:                    # Opens NCG
                 [0-9]+             # Character class (any of the characters within)
             )?                     # Closes NCG
         )+                         # Closes NCG
         [\.]?                      # Character class (any of the characters within)
     )+                             # Closes NCG
 )                                  # Closes CG1
 (?:                                # Opens NCG
     \s                             # Token: \s (white space)
 )?                                 # Closes NCG
 (?:                                # Opens NCG
     \:                             # Literal :
 )                                  # Closes NCG
 (?:                                # Opens NCG
     \s                             # Token: \s (white space)
 )?                                 # Closes NCG
 (                                  # Opens CG2
     (?:                            # Opens NCG
         .*                         # . denotes any single character, except for newline
     )+                             # Closes NCG
 )                                  # Closes CG2
 $                                  # Anchors to the end to the string.

我将 [0-9] 转换为 \d,只是为了更容易阅读(两者匹配相同的东西)。我还删除了很多非捕获组,因为它们并没有真正被使用。

^((?:(?:[A-Za-z_]+\d*)+\.?)+)\s?\:\s?((?:.*)+)$

我也将 \s 和 .* 合并到 [\s\S]* 中,但看到它后面跟着一个 + 符号,我删除了该组并制作了 [\s\S].

^((?:(?:[A-Za-z_]+\d*)+\.?)+)\s?\:([\s\S]+)$
                      ^

现在我不确定克拉上方的 + 应该做什么。我们可以删除它,从而删除它周围的非捕获组。

^((?:[A-Za-z_]+\d*\.?)+)\s?\:([\s\S]+)$

解释:

 ^                          # Anchors to the beginning to the string.
 (                          # Opens CG1
     (?:                    # Opens NCG
         [A-Za-z_]+         # Character class (any of the characters within)
         \d*                # Token: \d (digit)
         \.?                # Literal .
     )+                     # Closes NCG
 )                          # Closes CG1
 \s?                        # Token: \s (white space)
 \:                         # Literal :
 (                          # Opens CG2
     [\s\S]+                # Character class (any of the characters within)
 )                          # Closes CG2
 $                          # Anchors to the end to the string.

现在,如果您要处理多行,您可能希望将 [\s\S]+ 改回 .*。有几种不同的选择,但这取决于您使用的语言。

老实说,我是按步骤做的,但最大的问题是 (?:.*)+ 这是告诉引擎 match 0 or more characters 1 or more times catastrophic backtracking (as xufox linked to in comments).

生成的正则表达式以及您的原始正则表达式允许以 . 结尾的变量=51=]

这将匹配像 foo.ba5r 这样的名称,如果可以的话,您之前的正则表达式不会。

^([A-Za-z_]\w*(?:\.[A-Za-z_]+\w*)*)\s?\:([\s\S]+)$

解释:

 ^                          # Anchors to the beginning to the string.
 (                          # Opens CG1
     [A-Za-z_]              # Character class (any of the characters within)
     \w*                    # Token: \w (a-z, A-Z, 0-9, _)
     (?:                    # Opens NCG
         \.                 # Literal .
         [A-Za-z_]          # Character class (any of the characters within)
         \w*                # Token: \w (a-z, A-Z, 0-9, _)
     )*                     # Closes NCG
 )                          # Closes CG1
 \s?                        # Token: \s (white space)
 \:                         # Literal :
 (                          # Opens CG2
     [\s\S]+                # Character class (any of the characters within)
 )                          # Closes CG2
 $                          # Anchors to the end to the string.