Python:扫描文件中的子字符串,保存位置,然后 return 到它
Python: scan file for substring, save position, then return to it
我正在编写一个脚本,它需要扫描一个文件,直到找到子字符串出现的行,保存该行的开头位置,然后 return 到它。我是 python 的新手,所以我还没有取得太大的成功。这是我当前的代码:
with open("test.txt") as f:
pos = 0
line = f.readline()
while line:
if "That is not dead" in line:
pos = f.tell() - len(line.encode('utf-8'))
# pos = f.tell()
line = f.readline()
f.seek(pos)
str = f.readline()
print(str)
与test.txt:
That is not dead
Which can eternal lie
Till through strange aeons
Even Death may die
Sphinx of black quartz, judge my vow!
这是输出:
hat is not dead
[newline character]
我意识到我原来的 pos = f.tell()
给了我行的 end 的位置而不是开头,我找到了 answer detailing how to get the byte length of a string, but using this cuts off the first character. Using utf-16 or utf-16-le gives ValueError: negative seek position -18
or ValueError: negative seek position -16
, respectively. I tried to use the solution from this 答案,使用此代码:
with open("ctest.txt") as f:
pos = 0
line = f.readline()
while line:
if "That is not dead" in line:
print(line)
f.seek(-len(line), 1)
zz = f.readline()
print(zz)
line = f.readline()
f.seek(pos)
str = f.readline()
print(str)
在 f.seek(-len(line), 1)
处给出 io.UnsupportedOperation: can't do nonzero cur-relative seeks
有人可以指出我哪里错了吗?
Stefan Papp 建议在读取行之前保存位置,这是一个我没有考虑的简单解决方案。调整后的版本:
with open("test.txt") as f:
pos = 0
tempPos = 0
line = f.readline()
while line:
if "That is not" in line:
pos = tempPos
tempPos = f.tell()
line = f.readline()
f.seek(pos)
str = f.readline()
print(str)
正确输出:
That is not dead
[newline character]
谢谢斯特凡。我想我对我的问题太深入了,无法清楚地考虑它。
如果有比我所做的更好的遍历文件的方法,我很想知道,但这似乎有效。
我正在编写一个脚本,它需要扫描一个文件,直到找到子字符串出现的行,保存该行的开头位置,然后 return 到它。我是 python 的新手,所以我还没有取得太大的成功。这是我当前的代码:
with open("test.txt") as f:
pos = 0
line = f.readline()
while line:
if "That is not dead" in line:
pos = f.tell() - len(line.encode('utf-8'))
# pos = f.tell()
line = f.readline()
f.seek(pos)
str = f.readline()
print(str)
与test.txt:
That is not dead
Which can eternal lie
Till through strange aeons
Even Death may die
Sphinx of black quartz, judge my vow!
这是输出:
hat is not dead
[newline character]
我意识到我原来的 pos = f.tell()
给了我行的 end 的位置而不是开头,我找到了 ValueError: negative seek position -18
or ValueError: negative seek position -16
, respectively. I tried to use the solution from this 答案,使用此代码:
with open("ctest.txt") as f:
pos = 0
line = f.readline()
while line:
if "That is not dead" in line:
print(line)
f.seek(-len(line), 1)
zz = f.readline()
print(zz)
line = f.readline()
f.seek(pos)
str = f.readline()
print(str)
在 f.seek(-len(line), 1)
io.UnsupportedOperation: can't do nonzero cur-relative seeks
有人可以指出我哪里错了吗?
Stefan Papp 建议在读取行之前保存位置,这是一个我没有考虑的简单解决方案。调整后的版本:
with open("test.txt") as f:
pos = 0
tempPos = 0
line = f.readline()
while line:
if "That is not" in line:
pos = tempPos
tempPos = f.tell()
line = f.readline()
f.seek(pos)
str = f.readline()
print(str)
正确输出:
That is not dead
[newline character]
谢谢斯特凡。我想我对我的问题太深入了,无法清楚地考虑它。 如果有比我所做的更好的遍历文件的方法,我很想知道,但这似乎有效。