Python 请求-HTML 提取 SRC

Question

想知道是否有人可以提供帮助。我在文档中搜索了 requests-html 但没有成功 https://requests.readthedocs.io/projects/requests-html/en/latest/

以前我使用的是 requests 和 beautiful soup，但我抓取的网站现在已经实现了 javascript。我已经设法使用 Requests-HTML 提取文本，但不确定如何提取图像 SRC

from requests_html import HTMLSession

session = HTMLSession()
R = session.get(SHOPURL,headers=headers)
images = R.html.find(#website information)
for image in images:
    print(image)

对于每个存在的图像，这就是返回的内容

<Element 'img' _ngcontent-app-c164='' deferload=''>

网站上的图像文件名存储在“src”下

Answer 1

Element class 的 attrs 属性就是您要查找的内容 - 它是一个包含元素所有属性的字典。对于 img 元素（或标签），"src" 属性将包含图像的路径。所以：

for image in images:
    src = image.attrs["src"]
    print(src)


Output:
/img/logo.png
/img/header.png
http://www.website.com/img/hero_background.png
...

Answer 2

图像延迟加载，并且在 API 请求之后必须使用 header 信息进行身份验证

Python 请求-HTML 提取 SRC

Python Requests-HTML extracting SRC

python-3.x

python-requests-html