使用 Python 的选择性文本
Selective text using Python
我是python的初学者,我正在用它来做我的硕士论文,所以我不太了解。我有一堆年度报告(txt 格式)文件,我想 select "ITEM1." 和 "ITEM2." 之间的所有文本。我正在使用 re 包。我的问题是,有时,在这 10ks 中,有一个名为 "ITEM1A." 的部分。我希望代码能够识别这一点并在 "ITEM1A." 处停止并在输出中放入 "ITEM1." 和 "ITEM1A." 之间的文本。在我附加到此 post 的代码中,我试图让它在 "ITEM1A." 处停止,但它没有,它继续继续,因为 "ITEM1A." 在文件中多次出现。我最好让它在看到第一个时停下来。代码如下:
import os
import re
#path to where 10k are
saved_path = "C:/Users/Adrian PC/Desktop/Thesis stuff/10k abbot/python/Multiple 10k/saved files/"
#path to where to save the txt with the selected text between ITEM 1 and ITEM 2
selected_path = "C:/Users/Adrian PC/Desktop/Thesis stuff/10k abbot/python/Multiple 10k/10k_select/"
#get a list of all the items in that specific folder and put it in a variable
list_txt = os.listdir(saved_path)
for text in list_txt:
file_path = saved_path+text
file = open(file_path,"r+", encoding="utf-8")
file_read = file.read()
# looking between ITEM 1 and ITEM 2
res = re.search(r'(ITEM[\s\S]*1\.[\w\W]*)(ITEM+[\s\S]*1A\.)', file_read)
item_text_section = res.group(1)
saved_file = open(selected_path + text, "w+", encoding="utf-8") # save the file with the complete names
saved_file.write(item_text_section) # write to the new text files with the selected text
saved_file.close() # close the file
print(text) #show the progress
file.close()
如果有人对如何解决这个问题有任何建议,那就太好了。谢谢!
试试下面的正则表达式:
ITEM1\.([\s\S]*?)ITEM1A\.
添加问号使其非贪婪因此它会在第一次出现时停止
我是python的初学者,我正在用它来做我的硕士论文,所以我不太了解。我有一堆年度报告(txt 格式)文件,我想 select "ITEM1." 和 "ITEM2." 之间的所有文本。我正在使用 re 包。我的问题是,有时,在这 10ks 中,有一个名为 "ITEM1A." 的部分。我希望代码能够识别这一点并在 "ITEM1A." 处停止并在输出中放入 "ITEM1." 和 "ITEM1A." 之间的文本。在我附加到此 post 的代码中,我试图让它在 "ITEM1A." 处停止,但它没有,它继续继续,因为 "ITEM1A." 在文件中多次出现。我最好让它在看到第一个时停下来。代码如下:
import os
import re
#path to where 10k are
saved_path = "C:/Users/Adrian PC/Desktop/Thesis stuff/10k abbot/python/Multiple 10k/saved files/"
#path to where to save the txt with the selected text between ITEM 1 and ITEM 2
selected_path = "C:/Users/Adrian PC/Desktop/Thesis stuff/10k abbot/python/Multiple 10k/10k_select/"
#get a list of all the items in that specific folder and put it in a variable
list_txt = os.listdir(saved_path)
for text in list_txt:
file_path = saved_path+text
file = open(file_path,"r+", encoding="utf-8")
file_read = file.read()
# looking between ITEM 1 and ITEM 2
res = re.search(r'(ITEM[\s\S]*1\.[\w\W]*)(ITEM+[\s\S]*1A\.)', file_read)
item_text_section = res.group(1)
saved_file = open(selected_path + text, "w+", encoding="utf-8") # save the file with the complete names
saved_file.write(item_text_section) # write to the new text files with the selected text
saved_file.close() # close the file
print(text) #show the progress
file.close()
如果有人对如何解决这个问题有任何建议,那就太好了。谢谢!
试试下面的正则表达式:
ITEM1\.([\s\S]*?)ITEM1A\.
添加问号使其非贪婪因此它会在第一次出现时停止