如何通过 readline 查找匹配字符串并将结果拆分为多个文件
How to readline to find a match string and split the result into multiple files
这是场景,我有一个模式文件,需要逐行阅读。
花样文件的内容与此有些相似
chicken
chicken
chicken
chicken
## comment
## comment
fish
fish
chicken
chicken
chicken
到目前为止我想出的代码是这样的。
def readlines_write():
with open(filename) as rl:
for line in rl:
if "chicken" in line:
with open(new_filename, 'a+') as new_rl:
new_rl.write(line)
使用上面的代码我可以在那个模式文件中找到所有 "chicken",结果将写入 new_filename。但那不是objective。因为我已经将所有内容汇总在一个文件中。
我想分鸡写入多个文件。
例如。最后的结果应该是,连续一行一行读,如果找到chicken,下一行不包含chicken时停止。将其分解并写入文件,例如a.out。
然后脚本继续逐行读取并找到 "comment" 和 "fish" 之后的下一个匹配项。并将结果写入b.out
我想到了伪代码,但不确定如何将其转换为 python 逻辑。
总结一下,我想把评论和鸡以外的词分开的鸡分开。
只需添加一个 else 条件,并通过整数或时间戳不断更改文件名。
def readlines_write():
i = 0
new_filename = 'filename{}.out'.format(i)
with open(filename) as rl:
for line in rl:
if "chicken" in line:
with open(new_filename, 'a+') as new_rl:
new_rl.write(line)
else:
i +=1
new_filename = 'filename{}.out'.format(i)
因此,您正在寻找的是 连续组 的 chicken
行,并且您希望将每个组放入一个单独的文件中。很好,batteries are included.
import itertools
def is_chicken(x):
return 'chicken' in x # Can add more complex logic.
def write_groups(input_sequence):
count = 1
grouper = itertools.groupby(input_sequence, is_chicken)
for found, group in grouper:
# The value of `found` here is what `is_chicken` returned;
# we only want groups where it returned true.
if found:
with open('file-%d.chicken' % count, 'w') as f:
f.writelines(group)
count += 1
现在你可以
with open('input_file') as input_file:
write_groups(input_file)
同样的事情可以用功能更分解的方式来完成,但如果你不习惯生成器,那么理解起来会有点困难:
def get_groups(input_sequence):
grouper = itertools.groupby(input_sequence, is_chicken)
# Return a generator producing only the groups we want.
return (group for (found, group) in grouper if found)
with open('input_file') as input_file:
for (count, group) in enumerate(get_groups(input_file), start=1):
with open('file-%d.chicken' % count, 'w') as f:
f.writelines(group)
这是场景,我有一个模式文件,需要逐行阅读。
花样文件的内容与此有些相似
chicken
chicken
chicken
chicken
## comment
## comment
fish
fish
chicken
chicken
chicken
到目前为止我想出的代码是这样的。
def readlines_write():
with open(filename) as rl:
for line in rl:
if "chicken" in line:
with open(new_filename, 'a+') as new_rl:
new_rl.write(line)
使用上面的代码我可以在那个模式文件中找到所有 "chicken",结果将写入 new_filename。但那不是objective。因为我已经将所有内容汇总在一个文件中。
我想分鸡写入多个文件。
例如。最后的结果应该是,连续一行一行读,如果找到chicken,下一行不包含chicken时停止。将其分解并写入文件,例如a.out。
然后脚本继续逐行读取并找到 "comment" 和 "fish" 之后的下一个匹配项。并将结果写入b.out
我想到了伪代码,但不确定如何将其转换为 python 逻辑。
总结一下,我想把评论和鸡以外的词分开的鸡分开。
只需添加一个 else 条件,并通过整数或时间戳不断更改文件名。
def readlines_write():
i = 0
new_filename = 'filename{}.out'.format(i)
with open(filename) as rl:
for line in rl:
if "chicken" in line:
with open(new_filename, 'a+') as new_rl:
new_rl.write(line)
else:
i +=1
new_filename = 'filename{}.out'.format(i)
因此,您正在寻找的是 连续组 的 chicken
行,并且您希望将每个组放入一个单独的文件中。很好,batteries are included.
import itertools
def is_chicken(x):
return 'chicken' in x # Can add more complex logic.
def write_groups(input_sequence):
count = 1
grouper = itertools.groupby(input_sequence, is_chicken)
for found, group in grouper:
# The value of `found` here is what `is_chicken` returned;
# we only want groups where it returned true.
if found:
with open('file-%d.chicken' % count, 'w') as f:
f.writelines(group)
count += 1
现在你可以
with open('input_file') as input_file:
write_groups(input_file)
同样的事情可以用功能更分解的方式来完成,但如果你不习惯生成器,那么理解起来会有点困难:
def get_groups(input_sequence):
grouper = itertools.groupby(input_sequence, is_chicken)
# Return a generator producing only the groups we want.
return (group for (found, group) in grouper if found)
with open('input_file') as input_file:
for (count, group) in enumerate(get_groups(input_file), start=1):
with open('file-%d.chicken' % count, 'w') as f:
f.writelines(group)