UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 398: invalid start byte || book python for everyone

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 398: invalid start byte || book python for everyone

嘿,我正在尝试使用 python 中的套接字编程从网络服务器中拉取图像,同时为每个人阅读 python 书中有网络编程章节中的示例,我从示例 [=13] 中复制了代码=]urljpeg.py

import socket 
import time 
#HOST = 'data.pr4e.org'
#PORT = 80

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

mysock.connect(('data.pr4e.org', 80))

mysock.sendall(b'GET http://data.pr4e.org/cover3.jpg HTTP/1.0\r\n\r\n')
count = 0
picture = b""

while True:
    data = mysock.recv(5120)
    if len(data) < 1: break
# time .sleep(0.25)
    count = count + len(data)
    print( len(data),count)
    picture = picture + data

mysock.close()


# look for the end of the header (2crlf)

pos = picture.find(b"r\n\r\n")
print("Header length ", pos)
print(picture[:pos].decode())

# skip pasr the header and save the picture data
picture = picture[pos+4:]
fhand = open("stuff.jpg","wb")
fhand.write(picture)
fhand.close()

错误消息表明您正在尝试解码不是 utf-8 的数据。那么为什么会这样呢?让我们退后一步,看看代码在做什么:

# look for the end of the header (2crlf)
pos = picture.find(b"r\n\r\n")
print("Header length ", pos)
print(picture[:pos].decode())

我们试图在数据中找到 \r\n\r\n 的序列,即 CR LF CR LF。这将是将 HTTP header(应该是 ASCII,它是 UTF-8 的子集)与实际图像数据分开的空行。然后我们尝试将到此为止的所有内容解码为字符串。那么为什么会失败呢?该程序方便地打印了 header 长度,在您之前发布的位中我们可以看到这是 -1,这意味着 picture.find 调用没有找到任何东西!为什么不?嗯,仔细看看代码到底做了什么:

# look for the end of the header (2crlf)
pos = picture.find(b"r\n\r\n")

应该是找\r\n\r\n,其实是找r\n\r\n!