Beautiful Soup HTML 在浏览器中不匹配 'View Page Source'

Question

我一直在尝试使用 bs4 抓取网页，但是，HTML 似乎与我在 Chrome 中使用 'view page source' 时看到的不匹配。作为这方面的新手，任何关于这方面的指导将不胜感激！详情如下：

目标网页示例here和使用的代码如下所示。

import requests
from bs4 import BeautifulSoup

my_url = 'https://finance.yahoo.com/m/63c37511-b114-3718-a601-7e898a22439e/a-big-tech-encore-and-twitter.html'
response = requests.get(my_url)
doc = BeautifulSoup(response.text, "html.parser")

with open("output1.html", "w") as file:
    file.write(str(doc))

在我的浏览器 (Chrome) 中查看页面源代码时，下面的代码片段包含在 html:

"siteAttribute":"ticker=\"GOOGL;AAPL;PYPL;TWTR\"

但是，当查看上面代码的文件输出时，siteAttribute 已经改变并且不再具有相同的信息。相反，它显示：

"siteAttribute":"wiki_topics=\"Big_Tech;Apple_Inc.;Facebook;

在网上查了下还是不知道是什么原因造成的？提前致谢。

Answer 1

如果您从 chrome devtools 的弹出框选项卡中单击检查，然后按 ctrl + F 并粘贴 siteAttribute":"ticker=\"GOOGL;AAPL;PYPL;TWTR\，那么您将看到所需的结果位于脚本标签下。请查看来自 here

的屏幕截图

Beautiful Soup HTML 在浏览器中不匹配 'View Page Source'

Beautiful Soup HTML Not Matching 'View Page Source' in Browser

html

python

google-chrome

beautifulsoup

web-scraping