通过 python 3 字节文件转换

Question

有什么简单的方法可以让我以二进制字符串形式读取二进制文件的内容，将其转换为普通 (utf-8) 字符串，对其进行一些操作，然后将其转换回二进制字符串并将其写入二进制文件？我尝试做一些简单的事情：

a_file = open('image1.png', 'rb')
text = b''
for a_line in a_file:
    text += a_line
a_file.close()
text2 = text.decode('utf-8')
text3 = text2.encode()
a_file = open('image2.png', 'wb')
a_file.write(text3)
a_file.close()

但我得到 'Unicode can not decode bytes in position...'

我做错了什么？

Answer 1

utf8 格式具有足够的结构，字节的随机排列不是有效的 UTF-8。最好的方法是简单地处理从文件中读取的字节（您可以使用 text = a_file.read() 一步提取）。二进制字符串（类型 bytes）具有您需要的所有字符串方法，甚至是像 isupper() 或 swapcase() 这样的面向文本的方法。然后是 bytearray，bytes 类型的可变对应物。

如果出于某种原因您确实想要将您的字节转换为str对象，请使用像Latin1这样的纯8位编码。你会得到一个 unicode 字符串，这就是你真正想要的。（UTF-8 只是 Unicode 的一种编码——完全不同。）

通过 python 3 字节文件转换

Byte file conversion via python 3

python

encode

decode

utf-8

python-3.x