从 Url 打印 HTML

Question

所以我想打印出一个网站的HTML

from urllib.request import urlopen

http = urlopen('http://www.google.de/').read()
print(http)

但是在输出中，所有换行符都打印为 \n 并且字符串以 b' 开头，正如我的 google 研究告诉我的那样，这与 bite 数组有关？抱歉，我是 python xD

的新手

所以我的问题是如何将 html 代码打印为带有换行符的普通字符串，就像在文本编辑器中显示的那样？

Answer 1

查看 urlopen 文档。在HTMLheader中写成charset=UTF-8。因此，您需要将行更改为：

print(http.decode('utf-8'))

如果 HTML 输出中有特殊字符（由于区域设置），请使用：

print(http.decode('utf-8', errors='ignore'))

Print HTML From Url