遵循 urllib.request.urlopen 文档但仍然无法正常工作

Question

我按照文档进行操作，但仍然出现我无法弄清楚的错误。我正在使用 Python 3.

这是我的代码：

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen('http://pythonscraping.com/pages/page1.html')

bs = BeautifulSoup(html.read(), "html.parser")
print(bs.h1)

Code Editor with code and errors

Answer 1

你做对了一切。

您提供的URL使用HTTPS，您得到的错误与网站上的证书问题有关。

如果您想学习新事物，只需将 URL 更改为其他示例网站即可。

如果您想从特定 URL 获得结果而不考虑成本，请将关键字参数 context 添加到您的 urlopen 调用并为其提供正确的 SSL 上下文工作：

from ssl import create_default_context, CERT_NONE
from urllib.request import urlopen
from bs4 import BeautifulSoup
context = create_default_context()
context.verify_mode = ssl.CERT_NONE
html = urlopen('http://pythonscraping.com/pages/page1.html', context=context)
bs = BeautifulSoup(html.read(), "html.parser")
print(bs.h1)

遵循 urllib.request.urlopen 文档但仍然无法正常工作

followed urllib.request.urlopen documentation but still not working

python

urllib