无法使用 BeautifulSoup 从 "img" 标签中提取 src 属性

Question

我正在做一个项目，我想从网站上提取图片 URL。我对此一窍不通，所以请多多包涵。根据HTML代码，我想要的图片class是“fotorama__img”。但是，当我执行我的代码时，它似乎不起作用。任何人都知道为什么会这样？还有，为什么 src 属性不包含整个 URL，只是其中的一部分？示例：图像的 link 是 https://www.supermicro.com/files_SYS/images/System/SYS-120U-TNR_callout_front.jpg 但 img 标签的 src 属性是 "/files_SYS/images/System/sysThumb/SYS-120U-TNR_main.png".

这是我的代码：

from bs4 import BeautifulSoup
import requests 

page = requests.get("https://www.supermicro.com/en/products/system/Ultra/1U/SYS-120U-TNR")
soup = BeautifulSoup(page.content,'lxml')
images = soup.find_all("img", {"class": "fotorama__img"})
for image in images:
    print(image.get("src"))

这是页面 HTML 代码的图片

感谢您的帮助！

Answer 1

class 是通过 JavaScript 动态添加的，所以 beautifulsoup 看不到它。要从此站点提取图像，您可以执行以下操作：

import requests
from bs4 import BeautifulSoup

page = requests.get(
    "https://www.supermicro.com/en/products/system/Ultra/1U/SYS-120U-TNR"
)
soup = BeautifulSoup(page.content, "lxml")
images = [
    "https://www.supermicro.com" + a["href"]
    for a in soup.select(".fotorama > a")
]

print(*images, sep="\n")

打印：

https://www.supermicro.com/files_SYS/images/System/SYS-120U-TNR_main.png
https://www.supermicro.com/files_SYS/images/System/SYS-120U-TNR_callout_angle.jpg
https://www.supermicro.com/files_SYS/images/System/SYS-120U-TNR_callout_top.jpg
https://www.supermicro.com/files_SYS/images/System/SYS-120U-TNR_callout_front.jpg
https://www.supermicro.com/files_SYS/images/System/SYS-120U-TNR_callout_rear.jpg

无法使用 BeautifulSoup 从 "img" 标签中提取 src 属性

Can't extract src attribute from "img" tag with BeautifulSoup

html

python

image

beautifulsoup

src