如何在 python 读取文件代码中跳过一些块？

Question

我有这样的代码：

chunk_size=512*1024 #512 kb
big_file = open(file, 'rb')
while True:
        data = big_file .read(chunk_size)
        if not data:
            break

如果我只想每 10 个 item/element 或每 5 个元素读取一次，像这样，我该怎么做？

chunk_size=512*1024 #512 kb
big_file = open(file, 'rb')
counter = 0
while True:
        counter +=1
        if counter%5!=0:
           big_file.next(chunksize) #Just skip it, don't read it...HOW TO DO THIS LINE?
           continue #I want to skip the chunk, and in the next loop, read the next chunk.
        data = big_file .read(chunk_size)
        if not data:
            break

在这种情况下，速度对我来说非常重要。我将为数百万个文件做这件事。我正在做块散列。

Answer 1

您可以为此使用文件的 .seek() 方法。我使用 pos 跟踪文件中当前位置的计数。数据仅每 5 次被 .read(chunk_size) 读取一次。

超出文件大小的搜索不是问题。 data 那时将是空的，所以如果没有读取任何内容，我们就会中断。

chunk_size=512*1024 #512 kb
big_file = open("filename", 'rb')
counter = 0
pos = 0

while True:
    counter += 1
    if counter % 5 == 0:
        big_file.seek(pos)
        data = big_file.read(chunk_size)
        if not data:
            break
        print(data.decode("utf-8")) # here do your processing

    pos += chunk_size

如何在 python 读取文件代码中跳过一些块？

How to skip some chunks in python read file code?

python

python-3.x

python-3.6

python-3.7

python-3.8