为什么在网页抓取 Steam 游戏列表时 div returns 为空？

Question

我是网络抓取和使用方面的新手 BeautifulSoup4，如果我的问题很明显，我很抱歉。

我正在尝试从 Steam 获取播放时间，但是 <div id="games_list_rows" style="position: relative"> returns None 应该 return 有很多不同 <div class="gameListRow" id="game_730"> 里面有东西。

我试过一个玩过几场比赛的朋友的个人资料，因为我认为处理大量数据可能会使 BS4 忽略 div，但它一直显示 div空.

这是我的代码：

import bs4 as bs
import urllib.request

# Retrieve profile
profile = "chubaquin"#input("enter profile: >")
search = "https://steamcommunity.com/id/"+profile+"/games/?tab=all"

sauce = urllib.request.urlopen(search)
soup = bs.BeautifulSoup(sauce, "lxml")

a = soup.find("div", id="games_list_rows")

print(a)

Thanks for your help!

Answer 1

网站是动态加载的，因此requests不支持。尝试使用 Selenium 作为抓取页面的替代方法。

安装：pip install selenium.

从 here 下载正确的 ChromeDriver。

from time import sleep
from bs4 import BeautifulSoup
from selenium import webdriver


url = "https://steamcommunity.com/id/chubaquin/games/?tab=all"
driver = webdriver.Chrome(r"c:\path\to\chromedriver.exe")
driver.get(url)
# Wait for the page to fully render before parsing it
sleep(5)
soup = BeautifulSoup(driver.page_source, "html.parser")

print(soup.find("div", id="games_list_rows"))

Answer 2

你试过official Steam Web API了吗？（xPaw 文档比他们自己的更好）

您需要一个 API 密钥，但它们是免费的，处理 JSON 结果比抓取页面要容易得多，尤其是因为页面偶尔会更改而 JSON 根本不可能经常这样做。

为什么在网页抓取 Steam 游戏列表时 div returns 为空？

Why div returns empty when web-scraping Steam Game List?

html

python

urllib

beautifulsoup

web-scraping