删除换行 returns

Removing wrapped line returns

我想删除换行到一定宽度的文本行 returns。例如

import re
x = 'the meaning\nof life'
re.sub("([,\w])\n(\w)", " ", x)
'the meanin\x01 \x02f life'

我要returnthe meaning of life。我做错了什么?

你需要像这样 \ 逃脱:

>>> import re
>>> x = 'the meaning\nof life'

>>> re.sub("([,\w])\n(\w)", " ", x)
'the meanin\x01 \x02f life'

>>> re.sub("([,\w])\n(\w)", "\1 \2", x)
'the meaning of life'

>>> re.sub("([,\w])\n(\w)", r" ", x)
'the meaning of life'
>>>

如果不转义,输出为</code>,所以:</p> <pre><code>>>> '' '\x01' >>>

这就是为什么我们需要使用 '\\'r'\' 在 Python 正则表达式中显示信号 \

但是,来自 this answer:

If you're putting this in a string within a program, you may actually need to use four backslashes (because the string parser will remove two of them when "de-escaping" it for the string, and then the regex needs two for an escaped regex backslash).

the document

As stated earlier, regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This conflicts with Python's usage of the same character for the same purpose in string literals.

Let's say you want to write a RE that matches the string \section, which might be found in a LaTeX file. To figure out what to write in the program code, start with the desired string to be matched. Next, you must escape any backslashes and other metacharacters by preceding them with a backslash, resulting in the string \section. The resulting string that must be passed to re.compile() must be \section. However, to express this as a Python string literal, both backslashes must be escaped again.


brittenb 建议的另一种方式,在这种情况下您不需要 RegEx:

>>> x = 'the meaning\nof life'
>>> x.replace("\n", " ")
'the meaning of life'
>>> 

使用原始字符串文字; Python 字符串文字语法和正则表达式都解释反斜杠; </code> 在 python 字符串文字中被解释为八进制转义,但在原始字符串文字中不是:</p> <pre><code>re.sub(r"([,\w])\n(\w)", r" ", x)

另一种方法是将所有反斜杠加倍,以便它们到达正则表达式引擎。

请参阅 Python 正则表达式 HOWTO 的 Backslash plague section

演示:

>>> import re
>>> x = 'the meaning\nof life'
>>> re.sub(r"([,\w])\n(\w)", r" ", x)
'the meaning of life'

换行拆分可能更容易;使用 str.splitlines() method, then re-join with spaces using str.join():

' '.join(ex.splitlines())

但不可否认,这不会区分单词之间的换行符和其他地方的额外换行符。