如何从 python 中的字符串中删除所有类型的换行符和格式

Question

我知道处理换行符、制表符等的经典方法是.strip() 或.remove('\n','')。但有时会出现这些方法失败的特殊情况，例如

         'H\xf6cke\n\n:\n\nDie'.strip()

  gives: 'H\xf6cke\n\n:\n\nDie'

我怎样才能捕捉到这些必须一一涵盖的罕见情况（例如通过 .remove('*', '')？以上只是我遇到的一个例子。

Answer 1

In [1]: import re

In [2]: text = 'H\xf6cke\n\n:\n\nDie'

In [3]: re.sub(r'\s+', '', text)
Out[3]: 'Höcke:Die'

\s:

Matches Unicode whitespace characters (which includes [ \t\n\r\f\v], and also many other characters, for example the non-breaking spaces mandated by typography rules in many languages). If the ASCII flag is used, only [ \t\n\r\f\v] is matched (but the flag affects the entire regular expression, so in such cases using an explicit [ \t\n\r\f\v] may be a better choice).

'+'

Causes the resulting RE to match 1 or more repetitions of the preceding RE.

Answer 2

如果您不想导入任何东西，请使用replace

a = "H\xf6cke\n\n:\n\nDie"
print(a.replace("\n",""))

# Höcke:Die

Answer 3

Strip 的文档：
Return 带前导和尾随的字符串 S 的副本空格被删除。如果给出的是 chars 而不是 None，则移除 chars 中的字符。

这就是它没有删除文本中的“\n”的原因。

如果您想删除出现的“\n”，您可以使用

'H\xf6cke\n\n:\n\nDie'.replace('\n','')
Output: Höcke:Die

如何从 python 中的字符串中删除所有类型的换行符和格式

How to remove ALL kind of linebreaks or formattings from strings in python

python

nlp

strip

web-scraping