在 Python 中使用字符串方法进行网络抓取时遇到问题

Question

我对网络抓取非常陌生，我正在用 Python 制作一个简单的程序，它使用 str.find().

等字符串方法

目前，我通过

将网页的HTML代码提取为字符串

from urllib.request import urlopen

html_str = urlopen(url).read().decode('utf-8')

但是，我对为什么没有返回所有代码感到困惑。例如，Youtube 频道页面显示订阅人数

<yt-formatted-string id="subscriber-count" class="style-scope ytd-c4-tabbed-header-renderer">106M subscribers</yt-formatted-string>

但是这个字符串没有出现在html_str.

那么，出了什么问题？我有没有做错或使用不正确的地方？

Answer 1

一些网络抓取库不获取 JavaScript 代码或值。我所知道的一个库也确实获取 JavaScript 代码是“Selenium”。但它的代价是运行看起来比其他抓取库慢。

Trouble with webscraping using string methods in Python