无法使用 Beautifulsoup 读取网页的所有 html

Question

我正在尝试使用 Beautifulsoup 从 SEC 获取 10k 表格。不幸的是，以下代码并未显示所有 html。它从 html 中间的某处开始打印。但是，当应用于我尝试过的其他几个网页时，它工作正常。任何帮助都感激不尽。我是 python 编码的新手，我希望在它开始对我产生影响时学习更多 :)

import urllib.request, urllib.error
from bs4 import BeautifulSoup
import ssl

ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = "https://www.sec.gov/Archives/edgar/data/920148/000092014820000011/lh10-k2019.htm"
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, "html.parser")
print(soup.prettify().encode("utf-8"))

Answer 1

可能发生的情况是您的终端中没有足够的 space，所以您看到的只是其中的一部分，但实际上整个页面都在那里。我猜有效的页面要短得多。

无法使用 Beautifulsoup 读取网页的所有 html

Unable to read in the all the html of a webpage using Beautifulsoup

html

python

parsing

beautifulsoup

html-parsing