Python:按关键字将文本拆分为 excel 行

Python: Split text by keyword into excel rows

编程新手,已经找到很多有用的线程,但不是我需要的。
我有一个如下所示的文本文件:

  1 of 5000 DOCUMENTS


                    Copyright 2010 The Deal, L.L.C.
                          All Rights Reserved
                          Daily Deal/The Deal

                        January 12, 2010 Tuesday

HEADLINE: Cadbury slams Kraft bid

BODY:

  On cue .....

......

body of article here

......

DEAL SIZE

$ 10-50 Billion

                            2 of 5000 DOCUMENTS


                    Copyright 2015 The Deal, L.L.C.
                          All Rights Reserved
                           The Deal Pipeline

                      September 17, 2015 Thursday

HEADLINE: Perrigo rejects formal offer from Mylan

BODY: 
(and here again the body of this article)

DEAL SIZE

作为输出,我只希望在一个文件中的新行中的每篇文章正文(每个文章正文一个单元格)(我有大约 5000 篇文章需要这样处理)。输出将是 5000 行和 1 列。 据我所知,'re' 似乎是最好的解决方案。所以重复出现的关键字是 BODY:也许还有 DOCUMENTS。对于每篇文章,如何将这些关键字之间的文本提取到 excel 中的新行中?

import re
inputtext = 'F:\text.txt'
re.split(r'\n(?=BODY:)', inputtext)

或类似的东西?

section = []
for line in open_file_object:
if line.startswith('BODY:'):
    # new section
    if section:
        process_section(section)
    section = [line]
else:
    section.append(line)
if section:
process_section(section)

我有点不知道去哪里找,先谢谢了!

编辑:感谢 ewwink 我现在在这里:

import re
articlesBody = None
with open('F:\CloudStation\Bocconi University\MSc. Thesis\test folder\majortest.txt', 'r') as txt:
  inputtext = txt.read()
  articlesBody = re.findall(r'BODY:(.+?)\d\sDOCUMENTS', inputtext, re.S)

#print(articlesBody)
#print(type(articlesBody))

  with open('result.csv', 'w') as csv:
   for item in articlesBody:
    item = item.replace('\n', ' ')
    csv.write('"%s",' % item)

处理文件使用 with open('F:\text.txt', mode) 其中 mode'r' 用于读取和 'w' 用于写入,提取内容使用 re.findall 最后你需要转义新行 \n、双引号 " 和其他字符。

import re

articlesBody = None
with open('text.txt', 'r') as txt:
  inputtext = txt.read()
  articlesBody = re.findall(r'BODY:(.+?)\d\sof\s5000', inputtext, re.S)

#print(articlesBody)

with open('result.csv', 'w') as csv:
  for item in articlesBody:
    item = item.replace('\n', '\n').replace('"', '""')
    csv.write('"%s",' % item)

另注:尝试小内容