如何使用 findall 将文件拆分为多个文件

Question

我想在 python 中将我的文件拆分为多个文件。所以我发现 findall 函数可以做到这一点。

我的文件包含：

**05/02/2020

- Test PC


- Electricite 
W=10
PUI=5



- Test MAPS
Nothing for now
- Date/Hours
DateTest=12h14
DateFinish=13h18

**05/02/2020

所以，我使用下面的代码用“-”分割这个文件，但它分割得不是很好。

import re

def main():

  with open('mesfile.log', 'r') as f:
      data = f.read()

  found = re.findall(r'\n*(- .*?\- .*?)\n*', data, re.M | re.S)

  [open(str(i)+'.txt', 'w').write(found[i-1]) for i in range(1, len(found)+1)]

if __name__=="__main__":
  main()

预期输出

File 1 contains
- Test PC

File 2 contains
- Electricite 
W=10
PUI=5

File 3 contains
- Test MAPS
Nothing for now

 File 4 contains
- Date/Hours
DateTest=12h14
DateFinish=13h18

**05/02/2020

Answer 1

您可以试试这个，这比使用 re:

更简单快捷

with open('mesfile.log', 'r') as f:
    data = [i.strip() for i in f]

contents = []
for line in data :
    if line.startswith('-') :  # check for '-' separator
        contents.append([])
    if len(contents) > 0 :     # ignore everything before the first separator
        contents[-1].append(line)

for i,text in enumerate(contents) :
    with open( 'file_%05d.txt' % i, 'w') as fout :
        fout.write( '\n'.join( text ) )

Answer 2

使用：

res = []
#read content
with open(filename) as infile:
    for line in infile:
        line = line.strip()
        if line.startswith("*") or not line:continue   #skip empty line or date lines
        if line.startswith("-"):
            res.append([line])
        else:
            res[-1].append(line)

#write data to file
for idx, data in enumerate(res):
    with open("file_{}".format(idx), "w") as infile:
        infile.writelines(data)

Answer 3

您得到这些结果是因为模式 \n*(- .*?\- .*?)\n 匹配捕获组中连字符的 2 倍。

您可以改为匹配以连字符和 space 开头的行，然后匹配不以该模式开头的所有行。

然后将匹配项存储在单独的文件中。

^- .*(?:\r?\n(?!- ).*)*

^ 行首
- .* 匹配 - 和 space，然后匹配除换行符之外的任何字符，直到结束
(?:非捕获组
- \r?\n 匹配一个换行符
- (?!- ) 断言右边的不是 - 而 space
- .* 匹配除换行符以外的任何字符 0+ 次
)*关闭非捕获组并重复0+次

Regex demo

注意您不再需要 re.S。

例如

found = re.findall(r'^- .*(?:\r?\n(?!- ).*)*', data, re.M)

如何使用 findall 将文件拆分为多个文件

How to split file in muliple files with findall

python

split

findall