Python 无法解码字节串

Question

我在解码必须从一台计算机发送到另一台计算机的字节字符串时遇到问题。文件格式为 PDF。我收到错误消息：

fileStrings[i] = fileStrings[i].decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 648: invalid continuation byte

关于如何删除 b' ' 标记的任何想法？我需要编译文件备份，但我还需要在发送之前知道它的字节大小，我想我会通过解码每个字节字符串来知道它（适用于 txt 文件但不适用于 pdf 文件..）

代码是：

    with open(inputne, "rb") as file:
        while 1:
            readBytes= file.read(dataMaxSize)
            fileStrings.append(readBytes)
            if not readBytes:
                break
            readBytes= ''
    
    filesize=0
    for i in range(0, len(fileStrings)):
        fileStrings[i] = fileStrings[i].decode()
        filesize += len(fileStrings[i])

编辑：对于遇到相同问题的任何人，参数 len() 将为您提供不带 b'' 的大小。

Answer 1

在Python中，字节串用于原始二进制数据，字符串用于文本数据。 decode 尝试将其解码为 utf-8，这对 txt 文件有效，但对 pdf 文件无效，因为它们可以包含随机字节。您不应该 尝试获取字符串，因为字节串是设计用于此目的。您可以使用 len(data) 像平常一样获取字节串的长度。许多字符串操作也适用于字节串，例如连接和切片（data1 + data2 和 data[1:3]）。

附带说明一下，打印时 b'' 只是因为字节串的 __str__ 方法等同于 repr。它不在数据本身中。

Python 无法解码字节串

Python unable to decode byte string

python

encoding

decode