删除 python 中的某些文本模式
Delete certain text pattern in python
我正在尝试删除我的 .txt 文件中的某种文本模式,它类似于:
mystring = '''
example deletion words
in the first block
First sentence to keep.
example deletion words
in the second block
Second sentence to keep.
example deletion words
in the third block
Third sentence to keep.
example deletion words
in the fourth block'''
我想要的输出如下:
"保留第一句。
保留第二句。
保留第三句。
所以我想做的是去掉字符串 "example" 和 "block" 之间的所有文本,包括字符串本身。知道我将如何在 R 或 Python 中解决这个问题吗?
很抱歉忘记包括我对正则表达式的尝试,只是突然发问,感谢那些不顾一切努力回答的人。我在 python:
中使用正则表达式和重新打包的工作解决方案
import re
cleanedtext = re.sub('\nexample.*?block','',mystring, flags=re.DOTALL)
print(cleanedtext)
在 R 中,您可以使用 stringr
中的 str_remove_all
stringr::str_remove_all(string, "example.*block")
#[1] " First sentence to keep.\nSecond sentence to keep.\nThird sentence to keep.\n"
这是 shorthand 对应
stringr::str_replace_all(string, "example.*block", "")
数据
string <- "example deletion words in the first block First sentence to keep.
example deletion words in the second blockSecond sentence to keep.
example deletion words in the third blockThird sentence to keep.
example deletion words in the fourth block"
您是否已经提前知道模式或者模式是否会改变?如果没有,那么您可以阅读文本文件,逐行查看,拆分句子以便于操作,然后寻找模式。对于没有它的行,您可以将它连接到一个新字符串。我下面的内容似乎有效:
f = open("mytext.txt", "r")
final = ""
for line in f:
words = line.split(" ")
if(words[0] == "example" or words[len(words) - 1] == "block\n"):
continue
else:
final = final + line
print(final)
我得到的输出是:
First sentence to keep.
Second sentence to keep.
Third sentence to keep.
我正在尝试删除我的 .txt 文件中的某种文本模式,它类似于:
mystring = '''
example deletion words
in the first block
First sentence to keep.
example deletion words
in the second block
Second sentence to keep.
example deletion words
in the third block
Third sentence to keep.
example deletion words
in the fourth block'''
我想要的输出如下:
"保留第一句。
保留第二句。
保留第三句。
所以我想做的是去掉字符串 "example" 和 "block" 之间的所有文本,包括字符串本身。知道我将如何在 R 或 Python 中解决这个问题吗?
很抱歉忘记包括我对正则表达式的尝试,只是突然发问,感谢那些不顾一切努力回答的人。我在 python:
中使用正则表达式和重新打包的工作解决方案import re
cleanedtext = re.sub('\nexample.*?block','',mystring, flags=re.DOTALL)
print(cleanedtext)
在 R 中,您可以使用 stringr
str_remove_all
stringr::str_remove_all(string, "example.*block")
#[1] " First sentence to keep.\nSecond sentence to keep.\nThird sentence to keep.\n"
这是 shorthand 对应
stringr::str_replace_all(string, "example.*block", "")
数据
string <- "example deletion words in the first block First sentence to keep.
example deletion words in the second blockSecond sentence to keep.
example deletion words in the third blockThird sentence to keep.
example deletion words in the fourth block"
您是否已经提前知道模式或者模式是否会改变?如果没有,那么您可以阅读文本文件,逐行查看,拆分句子以便于操作,然后寻找模式。对于没有它的行,您可以将它连接到一个新字符串。我下面的内容似乎有效:
f = open("mytext.txt", "r")
final = ""
for line in f:
words = line.split(" ")
if(words[0] == "example" or words[len(words) - 1] == "block\n"):
continue
else:
final = final + line
print(final)
我得到的输出是:
First sentence to keep.
Second sentence to keep.
Third sentence to keep.