Python:扫描文件中的子字符串,保存位置,然后 return 到它

Python: scan file for substring, save position, then return to it

我正在编写一个脚本,它需要扫描一个文件,直到找到子字符串出现的行,保存该行的开头位置,然后 return 到它。我是 python 的新手,所以我还没有取得太大的成功。这是我当前的代码:

with open("test.txt") as f:
pos = 0
line = f.readline()
while line:
    if "That is not dead" in line:
        pos = f.tell() - len(line.encode('utf-8'))
        # pos = f.tell()

    line = f.readline()

f.seek(pos)
str = f.readline()
print(str)

与test.txt:

That is not dead
Which can eternal lie
Till through strange aeons
Even Death may die

Sphinx of black quartz, judge my vow!

这是输出:

hat is not dead

[newline character]

我意识到我原来的 pos = f.tell() 给了我行的 end 的位置而不是开头,我找到了 answer detailing how to get the byte length of a string, but using this cuts off the first character. Using utf-16 or utf-16-le gives ValueError: negative seek position -18 or ValueError: negative seek position -16, respectively. I tried to use the solution from this 答案,使用此代码:

with open("ctest.txt") as f:
pos = 0
line = f.readline()
while line:
    if "That is not dead" in line:
        print(line)
        f.seek(-len(line), 1)
        zz = f.readline()
        print(zz)
    line = f.readline()

f.seek(pos)
str = f.readline()
print(str)

f.seek(-len(line), 1)

处给出 io.UnsupportedOperation: can't do nonzero cur-relative seeks

有人可以指出我哪里错了吗?

Stefan Papp 建议在读取行之前保存位置,这是一个我没有考虑的简单解决方案。调整后的版本:

with open("test.txt") as f:
pos = 0
tempPos = 0
line = f.readline()
while line:
    if "That is not" in line:
        pos = tempPos
        
    tempPos = f.tell()
    line = f.readline()

f.seek(pos)
str = f.readline()
print(str)

正确输出:

That is not dead
[newline character]

谢谢斯特凡。我想我对我的问题太深入了,无法清楚地考虑它。 如果有比我所做的更好的遍历文件的方法,我很想知道,但这似乎有效。