在 python 中写入文件时如何跳过文本块
how to skip blocks of text when writing file in python
从另一个文件写入文件时,是否可以使用 python 跳过文本块?
例如假设输入文件是:
This is the file I would like to write this line
I would like to skip this line
and this one...
and this one...
and this one...
but I want to write this one
and this one...
我如何编写一个脚本,允许我跳过某些内容和大小不同的行,一旦它识别出某一行,就恢复将这些行写入另一个文件?
我的代码读取所有行,不写重复行并使用字典和正则表达式对行执行一些操作。
伪代码:
# Open input and output files, and declare the unwanted function
for line in file1:
if unwanted(line):
continue
file2.write(line)
# Close files etc...
您可以逐行阅读文件,并控制您阅读的每一行:
with open(<your_file>, 'r') as lines:
for line in lines:
# skip this line
# but not this one
请注意,如果您想读取所有行而不考虑内容,然后才对其进行操作,您可以:
with open(<your_file>) as fil:
lines = fil.readlines()
这应该有效:
SIZE_TO_SKIP = ?
CONTENT_TO_SKIP = "skip it"
with open("my/input/file") as input_file:
with open("my/output/file",'w') as output_file:
for line in input_file:
if len(line)!=SIZE_TO_SKIP and line!=CONTENT_TO_SKIP:
output_file.write(line)
def is_wanted(line):
#
# You have to define this!
#
# return True to keep the line, or False to discard it
def copy_some_lines(infname, outfname, wanted_fn=is_wanted):
with open(infname) as inf, open(outfname, "w") as outf:
outf.writelines(line for line in inf if wanted_fn(line))
copy_some_lines("file_a.txt", "some_of_a.txt")
为了将其扩展到 multi-line 个块,您可以实现一个有限状态机,例如
这会变成类似
的东西
class BlockState:
GOOD_BLOCK = True
BAD_BLOCK = False
def __init__(self):
self.state = self.GOOD_BLOCK
def is_bad(self, line):
# *** Implement this! ***
# return True if line is bad
def is_good(self, line):
# *** Implement this! ***
# return True if line is good
def __call__(self, line):
if self.state == self.GOOD_BLOCK:
if self.is_bad(line):
self.state = self.BAD_BLOCK
else:
if self.is_good(line):
self.state = self.GOOD_BLOCK
return self.state
然后
copy_some_lines("file_a.txt", "some_of_a.txt", BlockState())
从另一个文件写入文件时,是否可以使用 python 跳过文本块?
例如假设输入文件是:
This is the file I would like to write this line
I would like to skip this line
and this one...
and this one...
and this one...
but I want to write this one
and this one...
我如何编写一个脚本,允许我跳过某些内容和大小不同的行,一旦它识别出某一行,就恢复将这些行写入另一个文件?
我的代码读取所有行,不写重复行并使用字典和正则表达式对行执行一些操作。
伪代码:
# Open input and output files, and declare the unwanted function
for line in file1:
if unwanted(line):
continue
file2.write(line)
# Close files etc...
您可以逐行阅读文件,并控制您阅读的每一行:
with open(<your_file>, 'r') as lines:
for line in lines:
# skip this line
# but not this one
请注意,如果您想读取所有行而不考虑内容,然后才对其进行操作,您可以:
with open(<your_file>) as fil:
lines = fil.readlines()
这应该有效:
SIZE_TO_SKIP = ?
CONTENT_TO_SKIP = "skip it"
with open("my/input/file") as input_file:
with open("my/output/file",'w') as output_file:
for line in input_file:
if len(line)!=SIZE_TO_SKIP and line!=CONTENT_TO_SKIP:
output_file.write(line)
def is_wanted(line):
#
# You have to define this!
#
# return True to keep the line, or False to discard it
def copy_some_lines(infname, outfname, wanted_fn=is_wanted):
with open(infname) as inf, open(outfname, "w") as outf:
outf.writelines(line for line in inf if wanted_fn(line))
copy_some_lines("file_a.txt", "some_of_a.txt")
为了将其扩展到 multi-line 个块,您可以实现一个有限状态机,例如
这会变成类似
的东西class BlockState:
GOOD_BLOCK = True
BAD_BLOCK = False
def __init__(self):
self.state = self.GOOD_BLOCK
def is_bad(self, line):
# *** Implement this! ***
# return True if line is bad
def is_good(self, line):
# *** Implement this! ***
# return True if line is good
def __call__(self, line):
if self.state == self.GOOD_BLOCK:
if self.is_bad(line):
self.state = self.BAD_BLOCK
else:
if self.is_good(line):
self.state = self.GOOD_BLOCK
return self.state
然后
copy_some_lines("file_a.txt", "some_of_a.txt", BlockState())