读取二进制文件：Python 是否有等价的 unget()？

Question

我正在寻找二进制文件中的 2 字节序列，该文件太大而无法放入内存。我不能简单地一次读取 2 个字节，因为，例如

xx xx x1 2x xx

同样，我不能简单地寻找第一个，然后看看第二个是否存在，因为

xx112xx

我真的很想能够做这样的事情：

with open("myfile", "rb") as f:
    byte = f.read(1)
    while byte:
        if byte == b'1':
            if f.read(1) == b'2':
                # success case
            else:
                # put back the latest byte somehow
        byte = f.read(1)

是否有一些功能可以完成这种前瞻性工作，而无需自己完成所有簿记细节？

Answer 1

io.BufferedReader() object has a peek() method:

Return bytes from the stream without advancing the position. At most one single read on the raw stream is done to satisfy the call. The number of bytes returned may be less or more than requested.

当你以二进制模式打开一个文件进行读取时，你会得到这样一个对象，所以你可以直接在你的代码中使用它：

with open("myfile", "rb") as f:
    for byte in iter(lambda: f.read(1), b''):
        if byte == b'1':
            if f.peek(1) == b'2':
                # success case

考虑到我们查看的字节仍然是 'in the stream'，下一个 f.read() 调用将包含它。如果您不想要，则必须发出明确的 f.read(1)。

我用 iter() 2-argument call 替换了你的 while 循环，以便在 for 循环中一次读取文件 1 个字节。

读取二进制文件：Python 是否有等价的 unget()？

Reading a binary file: does Python have an unget() equivalent?

python

file

lookahead