有没有办法从 python 中的 response.content 中提取数据？

Question

我正在尝试弄清楚如何 scrape/extract 从 response.content 中提取图像 url。

这是我要提取的 url <img src="/Content/images/asos-logo-2022-93x28.png"

问题是 /Content/images/ 部分之后的所有内容都可以更改...

感谢任何帮助！！！

Answer 1

您可以为此使用 Beautiful Soup：

>>> import requests
>>> from bs4 import BeautifulSoup
>>> r = requests.get("
>>> html = r.text
>>> soup = BeautifulSoup(html, 'html.parser')
>>> for item in soup.find_all('img'): print(item['src'])
... 
https://cdn.sstatic.net/Img/teams/teams-illo-free-sidebar-promo.svg?v=47faa659a05e
https://www.gravatar.com/avatar/f96b33e2715bf57ba8e434140f0aeeba?s=64&d=identicon&r=PG&f=1
/posts/71636643/ivc/9bb6
https://sb.scorecardresearch.com/p?c1=2&c2=17440561&cv=3.6.0&cj=1

如果您想匹配特定图像，请查看文档如何 search by CSS class or any other CSS selectors。

有没有办法从 python 中的 response.content 中提取数据？

Is there a way how to extract data from response.content in python?

response.contenttype

python-3.x

python-requests

python-responses

request-response