通过 Beautiful Soup 抓取 Stream，横幅艺术为空 'trans.gif' 文件

Question

我目前正在编写一些 Python 代码，用 Beautiful Soup 抓取 Steam 主页，并输出有关所列游戏的一些基本信息。

page_soup = soup(page_html, 'html.parser')
container = page_soup.findAll('a', {'class':'tab_item'})
container.append(page_soup.findAll('a', {'class':'tab_item.app_impression_tracked'}))

[...]

    count = 0
    for item in container:
        price = container[count].find('div','tab_item_discount')
        title = container[count].find('div','tab_item_content')
        cover = container[count].find('div', 'tab_item_cap')
        tags = title.find('div', 'tab_item_top_tags')
        print("price: " + price['data-price-final'])
        print("Title: " + title.div.text)
        print("Cover: " + cover.img['src'])
        print("Tags: " + tags.text)
        count += 1

输出：

price: 0
Title: RetroArch
Cover: https://store.akamai.steamstatic.com/public/shared/images/trans.gif
Tags: Free to Play, Retro, Singleplayer, Multiplayer
price: 5999
Title: DEATHLOOP
Cover: https://store.akamai.steamstatic.com/public/shared/images/trans.gif
Tags: Action, FPS, First-Person, Stealth
[...]

除了封面（横幅图片）被抓取为一个空的 1x1 'trans.gif' 文件外，这大部分都有效。我不确定是我的代码中的什么缺陷导致的。

Answer 1

这不是您代码中的缺陷。该 1x1 图像可能是一个占位符，稍后会在页面上被 Javascript 替换。 BeautifulSoup 不执行 Javascript。如果你真的需要它，你需要使用像 Selenium 这样的东西来运行一个 Chrome 实例。比较麻烦，所以确保你需要那个封面。

通过 Beautiful Soup 抓取 Stream，横幅艺术为空 'trans.gif' 文件

Scraping Stream via Beautiful Soup, banner art is empty 'trans.gif' file

python

beautifulsoup

steam