抓取网站,但想从 srcset 中选择一个 img URL 并再做九次

Scraping website, but want to choose an img URL from a srcset and do it nine more times

我正在尝试从 BBC Sounds 网站上抓取**所有** 'currently playing' 图片。我不介意使用哪个尺寸,400w 可能不错。

下面是 HTML 和我当前的 python 脚本的相关摘录。 'now playing' 文本的这种变体非常有效,但我无法让它对图像 URL 起作用,这正是我所追求的,我想可能是因为 a) 有太多图像可供选择的 URL 和 b) 有一个空格,这无疑是解析器不喜欢的。请记住下面的 HTML 代码对每个频道重复大约 10 次。我仅举了一个例子。谢谢!

import requests
from bs4 import BeautifulSoup

url = "https://www.bbc.co.uk/sounds"

r = requests.get(url)

soup = BeautifulSoup(r.content, "lxml")

g_data = soup.find_all("div", {"class": "sc-o-responsive-image__img sc-u-circle"})

print g_data[0].text
print g_data[1].text
print g_data[2].text
print g_data[3].text
print g_data[4].text
print g_data[5].text
print g_data[6].text
print g_data[7].text
print g_data[8].text
print g_data[9].text

.

<div class="gel-layout__item sc-o-island"> 
<div class="sc-c-network-item__image sc-o-island" aria-hidden="true"> 
    <div class="sc-c-rsimage sc-o-responsive-image sc-o-responsive-image--1by1 sc-u-circle"> 
<img alt="" class="sc-o-responsive-image__img sc-u-circle" 
    src="https://ichef.bbci.co.uk/images/ic/400x400/p07fzzgr.jpg" srcSet="https://ichef.bbci.co.uk/images/ic/160x160/p07fzzgr.jpg 160w,
    https://ichef.bbci.co.uk/images/ic/192x192/p07fzzgr.jpg 192w,
    https://ichef.bbci.co.uk/images/ic/224x224/p07fzzgr.jpg 224w,
    https://ichef.bbci.co.uk/images/ic/288x288/p07fzzgr.jpg 288w,
    https://ichef.bbci.co.uk/images/ic/368x368/p07fzzgr.jpg 368w,
    https://ichef.bbci.co.uk/images/ic/400x400/p07fzzgr.jpg 400w,
    https://ichef.bbci.co.uk/images/ic/448x448/p07fzzgr.jpg 448w,
    https://ichef.bbci.co.uk/images/ic/496x496/p07fzzgr.jpg 496w,
    https://ichef.bbci.co.uk/images/ic/512x512/p07fzzgr.jpg 512w,
    https://ichef.bbci.co.uk/images/ic/576x576/p07fzzgr.jpg 576w,
    https://ichef.bbci.co.uk/images/ic/624x624/p07fzzgr.jpg 624w" 
    sizes="(max-width: 400px) 34vw,(max-width: 600px) 25vw,17vw"/>
import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.bbc.co.uk/sounds")
soup = BeautifulSoup(r.text, 'html.parser')

for item in soup.findAll("img", {'class': 'sc-o-responsive-image__img sc-u-circle'}):
    print(item.get("src"))

输出:

https://ichef.bbci.co.uk/images/ic/400x400/p05mpj80.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p07dg040.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p07zml97.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p0428n3t.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p01lyv4b.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p06yphh0.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p05v4t1c.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p06z9zzc.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p06x0hxb.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p06n253f.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p060m6jj.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p07l4fjw.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p03710d6.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p07nn0dw.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p07nn0dw.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p078qrgm.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p07sq0gr.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p07sq0gr.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p03crmyc.jpg