在 Python 中用正则表达式替换“(”时出错

Question

您好，有以下字符串：

s = r'aaa (bbb (ccc)) ddd'

我想找到最里面的嵌套括号并将其替换为 {}。想要的输出：

s = r'aaa (bbb {ccc}) ddd'

让我们从嵌套 ( 开始。我使用以下正则表达式来查找嵌套的括号，效果很好：

match = re.search(r'\([^\)]+(\()', s)
print(match.group(1))
(

然后我尝试进行替换：

re.sub(match.group(1), r'\{', s)

但我收到以下错误：

error: missing ), unterminated subpattern at position 0

我真的不明白怎么了。

Answer 1

你弄错了参数顺序：

sub(pattern, repl, string, count=0, flags=0)

Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable; if a string, backslash escapes in it are processed. If it is a callable, it's passed the Match object and must return a replacement string to be used.

模式排在第一位，但因为您已经给它 match.group(1)，它会将 '(' 视为模式，其中包含不匹配和未转义的括号。

我想你想要的是这样的：

re.sub(r'\([^\)]+(\()', r'{', s)
'aaa ({ccc)) ddd'

Answer 2

您可以使用

import re
s = r'aaa (bbb (ccc)) ddd'
print( re.sub(r'\(([^()]*)\)', r'{}', s) )
# => aaa (bbb {ccc}) ddd

参见Python demo。

详情:

\( - 一个 ( 字符
([^()]*) - 第 1 组 (</code>)：除 <code>( 和 )
\) - 一个 ) 字符。

替换为用花括号括起来的第 1 组值。

Answer 3

根据您展示的示例和尝试，请尝试使用 Python 中的以下代码，这些代码是在 Python3.x 中编写和测试的。这也是代码中使用的正则表达式的 Online demo。

import re
var = r'aaa (bbb (ccc)) ddd'
print( re.sub(r'(^.*?\([^(]*)\(([^)]*)\)(.*)', r'{}', var) )

所示示例的输出如下：

aaa (bbb {ccc}) ddd

Python代码解释：

在此处使用 python 的 re 库作为正则表达式。
正在创建一个名为 var 的变量，其中包含值 aaa (bbb (ccc)) ddd。
然后使用 python3 的 print 函数打印我们从 re.sub 函数获得的值，该函数正在为我们执行替换以获得所需的输出。

re.sub 部分的解释： 基本上我们使用正则表达式 (^.*?\([^(]*)\(([^)]*)\)(.*)（下面解释）创建3个捕获组（仅用于获取所需的值），其中第一个捕获组捕获 ( 之前的值，该值出现在 ccc 之前，第二个捕获组中包含 ccc，第三个捕获组具有其余的价值。在执行替换时，我们只是将其替换为 {} 并将值 ccc 包装在 {..}

中

正则表达式解释：

(^.*?\([^(]*)  ##Creating 1st capturing group which matches values from starting of value to till first occurrence of ( 
               ##with a Lazy match followed by a match which matches anything just before next occurrence of (
\(             ##Matching literal ( here, NO capturing group here as we DO NOT want this in output.
([^)]*)        ##Creating 2nd capturing group which has everything just before next occurrence of ) in it.
\)             ##Matching literal ) here, NO capturing group here as we DO NOT want this in output.
(.*)           ##Creating 3rd capturing group which has rest values in it.

Answer 4

正如您对我对该问题的评论的回复所表明的，以下示例字符串将按指示进行转换：

'(aaa) (bbb (ccc)) ddd'                => '(aaa) (bbb {ccc}) ddd'
'(aaa (eee)) (bbb ccc) ddd'            => '(aaa {eee}) (bbb ccc) ddd'
'(aaa) (ee (ff (gg))) (bbb (ccc)) ddd' => '(aaa) (ee (ff {gg})) (bbb {ccc}) ddd'

我们无法使用单个正则表达式获得这些结果，但我们可以通过执行一系列正则表达式来获得这些结果

r'\(([^()]*)\)(?=(?:[^()]*\)){n})'

用于 n = 0, 1, .... 并将匹配项替换为

r'{}'

如果 n = N 是没有匹配项的 n 的最小值，则所需的替换由 n = N-1.

生成的字符串给出

我假设字符串有 balanced parentheses.

字符串'a(b X c)'和'a(b(c(d X e)f)g)'有平衡括号； 'a(b(c(d X e)fg)' 和 'a(b))cd X ((efg)' 没有。

字符串中任何字符的 嵌套级别 等于遇到左括号之前的右括号数（等效于遇到左括号之前的左括号数）遇到右括号）。 'X'在以下字符串中的嵌套层数如下所示：

String          Nesting level
_____________________________
a X b                 0
a(b X c)              1
a(b(c X d)e)f         2
a(b(c(d X e)f)(g))    3

考虑字符串

'(aaa) (ee (ff (gg))) (bbb (ccc)) ddd'

我们先设置n = 0得到

r'\(([^()]*)\)(?=(?:[^()]*\)){0})'

Demo 0 表明匹配项的替换产生了字符串

{aaa} (ee (ff {gg})) (bbb {ccc}) ddd

现在设置n = 1来生成正则表达式

\(([^()]*)\)(?=(?:[^()]*\)){1})

Demo 1 表明匹配项的替换产生了字符串

(aaa) (ee (ff {gg})) (bbb {ccc}) ddd

接下来设置n = 2生成正则表达式

\(([^()]*)\)(?=(?:[^()]*\)){2})

Demo 2 and Python demo 显示匹配替换产生字符串

(aaa) (ee (ff {gg})) (bbb (ccc)) ddd

接下来设置n = 3生成正则表达式

\(([^()]*)\)(?=(?:[^()]*\)){3})

Demo 3 表示没有匹配项。因此，我们得出结论 n = 2 是嵌套括号的最大级别，因此所需的替换字符串必须是 n = 2:

时生成的字符串

(aaa) (ee (ff {gg})) (bbb (ccc)) ddd

Demo 4说明可能有联系。

在 Python 中用正则表达式替换“(”时出错

Error in substituting '(' with regex in Python

python

regex

substitution

parentheses