将 .split() 函数与 urllib.request 一起使用时出错

Question

我试图将 bbc 的来源分成两部分以获得头条新闻：

import urllib.request

url = 'http://www.bbc.com/'
page = urllib.request.urlopen(url)
contents = page.read()
page.close()

split1 = '<a class="media__link" href="/news/world-us-canada-39965107" rev="hero1|headline">\n'
split2 = '\n</a>'

title = contents.split(split1)[1].split(split2)[1]

print(title)

但是我收到这个错误：

title = contents.split(split1)[1].split(split2)[1]
TypeError: a bytes-like object is required, not 'str'

Answer 1

HTTPResponse.read([amt]):

Reads and returns the response body, or up to the next amt bytes.

contents = page.read()

returns 一个 bytes 对象，不是 str。所以分割分隔符也需要是 bytes 对象。在字符串前面加一个b就可以了。

split1 = b'<a class="media__link" href="/news/world-us-canada-39965107" rev="hero1|headline">\n'
split2 = b'\n</a>'

将 .split() 函数与 urllib.request 一起使用时出错

Error when using the .split() function with urllib.request

python

urllib

typeerror

python-3.x

python-3.6