替换 python 中的 unicode 括号

Replacing unicode brackets in python

如何用 spaces 填充 unicode 括号?

当我尝试使用 re.sub 时,我得到 sre_constants.error:

>>> import re
>>> open_punct = ur'([{༺༼᚛‚„⁅⁽₍〈❨❪❬❮❰❲❴⟅⟦⟨⟪⟬⟮⦃⦅⦇⦉⦋⦍⦏⦑⦓⦕⦗⧘⧚⧼⸢⸤⸦⸨〈《「『【〔〖〘〚〝﴾︗︵︷︹︻︽︿﹁﹃﹇﹙﹛﹝([{⦅「'
>>> text = u'this is a weird ❴sentence ⟅with some crazy ⟦punctuations sprinkled⟨'
>>> re.sub(open_punct, ur' ', text)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/re.py", line 155, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/usr/lib/python2.7/re.py", line 251, in _compile
    raise error, v # invalid expression
sre_constants.error: unexpected end of regular expression


当我尝试使用 re.escape 时,它没有出现错误,但 re.sub 没有用 space:

>>> re.sub(re.escape(open_punct), ur' ', text)
u'this is a weird \u2774sentence \u27c5with some crazy \u27e6punctuations sprinkled\u27e8'
>>> print re.sub(re.escape(open_punct), ur' ', text)
this is a weird ❴sentence ⟅with some crazy ⟦punctuations sprinkled⟨


>>> for p in open_punct:
...     text = text.replace(p, p+' ')
>>> text
u'this is a weird \u2774 sentence \u27c5 with some crazy \u27e6 punctuations sprinkled\u27e8 '
>>> print text
this is a weird ❴ sentence ⟅ with some crazy ⟦ punctuations sprinkled⟨ 
>>> open_punct
>>> print open_punct


[( 在正则表达式中有特殊含义,解析器正在寻找它们的 ]) 对应物。

如果你想让 open_punct 成为一个 字符组 ,你无论如何都要用 [..] 括起所有字符,此时([ 可以不转义地包含在内。您的 'expression' 仅匹配包含 所有这些字符 的文本。

由于您还希望引用捕获组 (</code>),因此添加括号:</p> <pre><code>>>> re.sub(u'([{}])'.format(open_punct), ur' ', text) u'this is a weird \u2774 sentence \u27c5 with some crazy \u27e6 punctuations sprinkled\u27e8 ' >>> print re.sub(u'([{}])'.format(open_punct), ur' ', text) this is a weird ❴ sentence ⟅ with some crazy ⟦ punctuations sprinkled⟨

请注意,使用 re.escape() 仍然是一个好主意,以防您有 -] 字符,或者您想要的组中有 \[group] 序列与之匹配。 - 定义一个字符序列(0-9 表示所有数字),] 组的结尾,以及 \d\w\s等,都定义了pre-defined个字符组:

re.sub(u'([{}])'.format(re.escape(open_punct)), ur' ', text)