如何用“for index, line in enumerate(file)”解码二进制文件？

Question

我正在打开一个非常大的二进制文件，我在 Python 3.5 中打开 file1.py:

with open(pathname, 'rb') as file:
    for i, line in enumerate(file):
        # parsing here

但是，我自然会出错，因为我正在以二进制模式读取文件，然后创建一个字节列表。然后使用 for 循环，将字符串与字节进行比较，此处代码失败。

如果我逐行阅读，我会这样做：

with open(fname, 'rb') as f:
    lines = [x.decode('utf8').strip() for x in f.readlines()]

但是，我正在使用 for index, lines in enumerate(file):。在这种情况下正确的方法是什么？我解码下一个对象吗？

这是我的实际代码运行：

with open(bam_path, 'rb') as file:
    for i, line in enumerate(file):
        line_data=pd.DataFrame({k.strip():v.strip()
            for k,_,v in (e.partition(':')
                for e in line.split('\t'))}, index=[i])

这里是错误：

Traceback (most recent call last):                                                                                                
  File "file1.py", line 18, in <module>                                                                                        
    for e in line.split('\t'))}, index=[i])                                                                                       
TypeError: a bytes-like object is required, not 'str'

Answer 1

您可以将解码行提供给生成器 enumerate:

for i, line in enumerate(l.decode(errors='ignore') for l in f):

在解码后生成 f 中每一行的技巧。我添加了 errors='ignore'，因为用 r 打开失败，起始字节未知。

顺便说一句，您可以在 bytes 上操作时将所有字符串文字替换为字节文字，即：partition(b':')、split(b'\t') 并使用 [=15= 完成您的工作]（很确定 pandas 与他们一起工作正常）。

如何用“for index, line in enumerate(file)”解码二进制文件？

How to decode binary file with " for index, line in enumerate(file)"?

python

binary

decode

enumerate

python-3.x