Python：按关键字将文本拆分为 excel 行

Question

编程新手，已经找到很多有用的线程，但不是我需要的。
我有一个如下所示的文本文件：

  1 of 5000 DOCUMENTS


                    Copyright 2010 The Deal, L.L.C.
                          All Rights Reserved
                          Daily Deal/The Deal

                        January 12, 2010 Tuesday

HEADLINE: Cadbury slams Kraft bid

BODY:

  On cue .....

......

body of article here

......

DEAL SIZE

$ 10-50 Billion

                            2 of 5000 DOCUMENTS


                    Copyright 2015 The Deal, L.L.C.
                          All Rights Reserved
                           The Deal Pipeline

                      September 17, 2015 Thursday

HEADLINE: Perrigo rejects formal offer from Mylan

BODY: 
(and here again the body of this article)

DEAL SIZE

作为输出，我只希望在一个文件中的新行中的每篇文章正文（每个文章正文一个单元格）（我有大约 5000 篇文章需要这样处理）。输出将是 5000 行和 1 列。据我所知，'re' 似乎是最好的解决方案。所以重复出现的关键字是 BODY：也许还有 DOCUMENTS。对于每篇文章，如何将这些关键字之间的文本提取到 excel 中的新行中？

import re
inputtext = 'F:\text.txt'
re.split(r'\n(?=BODY:)', inputtext)

或类似的东西？

section = []
for line in open_file_object:
if line.startswith('BODY:'):
    # new section
    if section:
        process_section(section)
    section = [line]
else:
    section.append(line)
if section:
process_section(section)

我有点不知道去哪里找，先谢谢了！

编辑：感谢 ewwink 我现在在这里：

import re
articlesBody = None
with open('F:\CloudStation\Bocconi University\MSc. Thesis\test folder\majortest.txt', 'r') as txt:
  inputtext = txt.read()
  articlesBody = re.findall(r'BODY:(.+?)\d\sDOCUMENTS', inputtext, re.S)

#print(articlesBody)
#print(type(articlesBody))

  with open('result.csv', 'w') as csv:
   for item in articlesBody:
    item = item.replace('\n', ' ')
    csv.write('"%s",' % item)

Answer 1

处理文件使用 with open('F:\text.txt', mode) 其中 mode 是 'r' 用于读取和 'w' 用于写入，提取内容使用 re.findall 最后你需要转义新行 \n、双引号 " 和其他字符。

import re

articlesBody = None
with open('text.txt', 'r') as txt:
  inputtext = txt.read()
  articlesBody = re.findall(r'BODY:(.+?)\d\sof\s5000', inputtext, re.S)

#print(articlesBody)

with open('result.csv', 'w') as csv:
  for item in articlesBody:
    item = item.replace('\n', '\n').replace('"', '""')
    csv.write('"%s",' % item)

另注：尝试小内容

Python：按关键字将文本拆分为 excel 行

Python: Split text by keyword into excel rows

python

regex

text

extract

sentiment-analysis