如何通过 readline 查找匹配字符串并将结果拆分为多个文件

Question

这是场景，我有一个模式文件，需要逐行阅读。

花样文件的内容与此有些相似

chicken 
chicken
chicken
chicken
## comment
## comment
fish
fish
chicken
chicken
chicken

到目前为止我想出的代码是这样的。

def readlines_write():
    with open(filename) as rl:
        for line in rl:
            if "chicken" in line:
                with open(new_filename, 'a+') as new_rl:
                    new_rl.write(line)

使用上面的代码我可以在那个模式文件中找到所有 "chicken"，结果将写入 new_filename。但那不是objective。因为我已经将所有内容汇总在一个文件中。

我想分鸡写入多个文件。

例如。最后的结果应该是，连续一行一行读，如果找到chicken，下一行不包含chicken时停止。将其分解并写入文件，例如a.out。

然后脚本继续逐行读取并找到 "comment" 和 "fish" 之后的下一个匹配项。并将结果写入b.out

我想到了伪代码，但不确定如何将其转换为 python 逻辑。

总结一下，我想把评论和鸡以外的词分开的鸡分开。

Answer 1

只需添加一个 else 条件，并通过整数或时间戳不断更改文件名。

def readlines_write():
        i = 0
        new_filename = 'filename{}.out'.format(i)
        with open(filename) as rl:
            for line in rl:
                if "chicken" in line:
                    with open(new_filename, 'a+') as new_rl:
                        new_rl.write(line)
                else:
                    i +=1
                    new_filename = 'filename{}.out'.format(i)

Answer 2

因此，您正在寻找的是 连续组 的 chicken 行，并且您希望将每个组放入一个单独的文件中。很好，batteries are included.

import itertools

def is_chicken(x):
    return 'chicken' in x # Can add more complex logic.

def write_groups(input_sequence):
    count = 1
    grouper = itertools.groupby(input_sequence, is_chicken)
    for found, group in grouper:
        # The value of `found` here is what `is_chicken` returned;
        # we only want groups where it returned true.
        if found:
            with open('file-%d.chicken' % count, 'w') as f:
                f.writelines(group)
            count += 1

现在你可以

with open('input_file') as input_file:
    write_groups(input_file)

同样的事情可以用功能更分解的方式来完成，但如果你不习惯生成器，那么理解起来会有点困难：

def get_groups(input_sequence):
    grouper = itertools.groupby(input_sequence, is_chicken)
    # Return a generator producing only the groups we want.
    return (group for (found, group) in grouper if found)


with open('input_file') as input_file:
    for (count, group) in enumerate(get_groups(input_file), start=1):
        with open('file-%d.chicken' % count, 'w') as f:
            f.writelines(group)

如何通过 readline 查找匹配字符串并将结果拆分为多个文件

How to readline to find a match string and split the result into multiple files

python

strip

readline