如何从 python 列表中的字符串中删除 \n1、\n2、\n3 等？

Question

我创建了一个 python 列表 question_text_list，其中包含从 csv 文件中检索到的字符串（文本）

['text1, 'text2...'text100000']

列表中的一个文本如下所示

'in star trek 2013 why did they \n\nspoilers\nspoilers\nspoilers\nspoilers\n\n1make warping look quite a bit like an hyperspace jump\n2what in the world were those bright particles as soon as they jumped\n3why in the world did they make it possible for two entities to react in warp space in separate jumps\n4why did spock get emotions for this movie\n5what was the point of hiding the enterprise underwater\n6when they were intercepted by the dark ship how come they reached earth when they were far away from heri dont seem to remember the scene where they warp to earth\n7how did the ship enter earths atmosphere when it wasnt even in orbit\n8when scotty opened the door of the black ship how come pike and khan didnt slow down'

我应用了以下命令，希望我可以删除 \n1、\n2..\n8.. 以及 \nspoilers

    question_text_list = [x.replace('\n*',' ').replace('\nspoilers','') for x in question_text_list]

我得到以下输出，这是不可取的，因为我仍然看到 \n1、\n2 删除了 \n，但没有看到像“1”、“2”这样的尾随数字

'in star trek 2013 why did they 1make warping look quite a bit like an hyperspace jump2what in the world were those bright particles as soon as they jumped3why in the world did they make it possible for two entities to react in warp space in separate jumps4why did spock get emotions for this movie5what was the point of hiding the enterprise underwater6when they were intercepted by the dark ship how come they reached earth when they were far away from heri dont seem to remember the scene where they warp to earth7how did the ship enter earths atmosphere when it wasnt even in orbit8when scotty opened the door of the black ship how come pike and khan didnt slow down'

问题 - 如何删除所有带有尾随数字的换行符，如 \n1,\n2... Python?

Answer 1

您可以使用：

li = [...] # your orginal list

li = [item.rstrip('\n') for item in li]

Answer 2

一个简单的正则表达式就可以解决问题：

import re 

text = 'in star trek 2013 why did they \n\nspoilers ...' # leaving out for brevity
article = re.sub(r'\n[0-9]?(spoilers)?', '', x)

正则表达式 \n[0-9]?(spoilers)? 表示：

\n => 匹配 \n

[0-9]? => 匹配从 0 到 9 的任何数字，但它不一定存在（? 部分）

(spoilers)? => 匹配整个单词 spoilers，但它不一定存在

Answer 3

你应该为此使用正则表达式：

假设您的变量名为文本，您应该执行以下操作：

import re
text = re.sub(r'\n\d', ' ', text).replace("\nspoilers","").replace("\n","")

这将首先删除所有 \nNumbers，因此 \n1 \n2 等...，第二个替换将简单地删除 \nspoilers，第三个将删除任何不需要的 \n。结果将是这样的：

'in star trek 2013 why did they  make warping look quite a bit like an hyperspace jump what in the world were those bright particles as soon as they jumped why in the world did they make it possible for two entities to react in warp space in separate jumps why did spock get emotions for this movie what was the point of hiding the enterprise underwater when they were intercepted by the dark ship how come they reached earth when they were far away from heri dont seem to remember the scene where they warp to earth how did the ship enter earths atmosphere when it wasnt even in orbit when scotty opened the door of the black ship how come pike and khan didnt slow down'

如何从 python 列表中的字符串中删除 \n1、\n2、\n3 等？

How to remove \n1, \n2, \n3 etc. from a string in python list?

python

string

replace

strip

pandas