从某个点开始读取文本文件(python)

Reading a text file from a certain point (python)

我正在尝试编写可以在文件中找到特定单词并从那里开始读取直到再次读取相同单词的代码。在这种情况下,这个词是“故事”。代码对行数进行计数,直到出现单词,然后在第二个循环中再次从 0 开始计数。我尝试过使用函数和全局变量,但我总是得到相同的数字两次,我不知道为什么。

file = open("testing_area.txt", "r")
line_count = 0
counting = line_count

for line in file.readlines()[counting:]:
        if line != "\n":
            line_count = line_count + 1
            if line.startswith('story'):
                #line_count += 1
                break
          
print(line_count)

for line in file.readlines()[counting:]:
        if line != "\n":
            line_count = line_count + 1
            if line.startswith('story'):
                #line_count += 1
                break

print(line_count)
file.close()

输出:

6
6

预期输出:

6
3

这是文本文件:

text
text
text
text
text
story
text
text
story

这里有几个问题。首先是,对于给定的文件对象,readlines() 基本上只能使用一次。想象一下在编辑器中打开的文本文件,光标从开头开始。 readline()(单数)读取下一行,将光标向下移动一位: readlines()(复数)读取从光标当前位置到末尾的所有行。一旦你调用了一次,就没有更多的行可以阅读了。您可以通过将 lines = file.readlines() 之类的内容放在顶部,然后循环遍历结果列表来解决此问题。 (有关详细信息,请参阅 this section in the docs。)

但是,您既没有将 line_count 重置为 0,也没有将 counting 设置为任何 0,所以循环仍然不会按照您的意图进行。你想要更多这样的东西:

with open("testing_area.txt") as f:
    lines = f.readlines()

first_count = 0
for line in lines:
    if line != "\n":
        first_count += 1
        if line.startswith('story'):
            break 
print(first_count)

second_count = 0
for line in lines[first_count:]:
    if line != "\n":
        second_count += 1
        if line.startswith('story'):
            break
print(second_count)

(这也用到了with关键字,automatically closes the file即使程序遇到异常。)

就是说,您一开始真的不需要两个循环。你在循环一组行,所以只要你重新设置行号,你就可以一次完成:

line_no = 0
words_found = 0

with open('testing_area.txt') as f:
    for line in f:
        if line == '\n':
            continue
        line_no += 1
        if line.startswith('story'):
            print(line_no)
            line_no = 0
            words_found += 1
            if words_found == 2:
                break

(使用 if line == '\n': continue 在功能上与将循环代码的其余部分放在 if line != '\n': 中相同,但我个人喜欢避免额外的缩进。这主要是个人喜好问题。)

代码可以简化为:

with open("testing_area.txt", "r") as file:              # Context manager preferred for file open
    first, second = None, None                           # index of first and second occurance of 'story'
    for line_count, line in enumerate(file, start = 1):  # provides line index and content
        if line.startswith('story'):                     # no need to check separately for blank lines 
            if first is None:
                first = line_count  # first is None, so this must be the first
            else:
                second = line_count  # previously found first, so this is the second
                break                # have now found first & second
       
print(first, second - first)         # index of first occurrence and number of lines between first and second
# Output: 6, 3
            

由于题中没有说只需要算两次,我提供了一个解决方案,每次找到“story”时,都会读取整个文件并打印。

# Using with to open file is preferred as file will be properly closed
with open("testing_area.txt") as f:
    line_count = 0
    for line in f:
        line_count += 1
        if line.startwith("story"):
            print(line_count)
            # reset the line_count if "story" found
            line_count = 0

输出:

6
3