与浏览网站相比,为什么 python requests.get() 检索到不同的图像 src

Why does python requests.get() retrieve different image src compared to browsing the site

如标​​题所示:调用 requests.get() 方法给我一个不同的图像 src link 与手动浏览网站时不同。

我正在尝试抓取产品网站并想存储图像,但我从该网站获得的 src 是非常低质量且模糊的图像。我将 src 与网站上的进行了比较,结果有所不同。不确定我是否需要传递一些东西来“强制”请求中的屏幕尺寸?

我的代码如下:

from requests import get
from lxml import html

def demo():
    params = {'page': 0}
    response = get('https://www.checkers.co.za/c-2256/All-Departments', params=params)
    tree = html.fromstring(response.content)
    images = tree.xpath('//a[@class="product-listening-click"]/img[@src]')
    images = ['https://www.checkers.co.za' + image.attrib['src'] for image in images]
    print(images)

列表中第一个 link 的 src 与站点上原始图像的差异

site src:
https://www.checkers.co.za/medias/10136669EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3wzNTA4MHxpbWFnZS9wbmd8aW1hZ2VzL2g1My9oZmYvODg1NzQ3ODYyNzM1OC5wbmd8YTM4YjE3YmMxYzJjMzI4MmIzMTQ0ZWU1MjlkYjBmNWZjZGFhYzYxYzAyZGMyNDhlNDE0MDhjYWQ0MjQxNmQ3NA

retrieved src:
https://www.checkers.co.za/medias/lqi-10136669EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3wxNDkwfGltYWdlL3BuZ3xpbWFnZXMvaDY1L2gwNC85MDgxNzU4NDgyNDYyLnBuZ3w0MmY3ZmMzNzJmYTU0MGIzNDk0ZjdmOTkyODYwMGI3N2I5YWJhZDRkOTljNzViYjIxMWQ3OWU2NDVjZGZhZTdm

编辑 1:

我尝试使用 fake-useragent 包添加 User-agent header 并遍历所有可能的。 src 结果没有改变。

编辑 2:

似乎用 lxml.html 解析它而不是 bs4 为图像的 data-original-src 提供不同的输出。不知道为什么,但感谢@AmineBTG 帮助注意到这个问题。

注:
使用 lxml.html//a[@class="product-listening-click"]/img/@data-original-src 而不是 '//a[@class="product-listening-click"]/img[@data-original-src]' 访问 data-original-src 就可以了。 Header 测试时看起来不需要。

当网络浏览器发送 HTTP 请求时,它会在 header 中包含大量关于自身的信息,从而允许网站检索最适合在该特定浏览器中显示的自身版本。当您通过 requests 模块发出请求时,该网站不会获得任何此类信息,并发送一个与您在浏览器中获得的版本略有不同的网站版本。

这就是为什么您会根据您请求网站的方式获得两个不同图像源的原因。浏览器正在获取 higher-quality 图片,因为网站有足够的信息来说明图片将被用于发送最佳版本的图片,而脚本请求获取 lower-quality 图片是因为该网站发送了一个小得多的图像版本以减少流量。

它在传递适当的 headers 并检索“data-original-src”属性而不是“src”属性时起作用。请看下面的代码(稍作修改)

import requests
from bs4 import BeautifulSoup

def demo():
    header ={
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
        "accept-language": "fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7",
        "cache-control": "max-age=0",
        "sec-ch-ua": "\"Google Chrome\";v=\"87\", \" Not;A Brand\";v=\"99\", \"Chromium\";v=\"87\"",
        "sec-ch-ua-mobile": "?0",
        "sec-fetch-dest": "document",
        "sec-fetch-mode": "navigate",
        "sec-fetch-site": "none",
        "sec-fetch-user": "?1",
        "upgrade-insecure-requests": "1",
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
    }

    params = {'page': 0}

    r = requests.get('https://www.checkers.co.za/c-2256/All-Departments', headers = header, params=params)
    s = BeautifulSoup(r.content, "html.parser")

    products = s.find_all("div", {"class":"item-product__image"})
    images = ['https://www.checkers.co.za' + prod.find("img").attrs.get("data-original-src") for prod in products]

    return images

print(demo())

输出:

['https://www.checkers.co.za/medias/10136669EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3wzNTA4MHxpbWFnZS9wbmd8aW1hZ2VzL2g1My9oZmYvODg1NzQ3ODYyNzM1OC5wbmd8YTM4YjE3YmMxYzJjMzI4MmIzMTQ0ZWU1MjlkYjBmNWZjZGFhYzYxYzAyZGMyNDhlNDE0MDhjYWQ0MjQxNmQ3NA', 'https://www.checkers.co.za/medias/10136301EAV2-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w1ODA4OHxpbWFnZS9wbmd8aW1hZ2VzL2g2ZS9oMGIvOTA5NjUxNTA5MjUxMC5wbmd8ZTViYzUzY2FiOWIyNWNmYmY0OGQ0ZGY0ZmY2ZDQwMGI3Nzk4ODMwOGYzMWRhNjIxOGZmM2Y1ZTExNDgxZWZjMg', 'https://www.checkers.co.za/medias/10151456EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w4NDE0NHxpbWFnZS9wbmd8aW1hZ2VzL2gzNC9oMjIvODg1NzgxMjU5ODgxNC5wbmd8OWMxYjE4Nzc0MjNkZTU2ZGI5ZDZmN2Q2M2FhMTdhZmM3Yjc4NDgwMzEwMjg1NmNiZTM1YWNjZjkxOTUyMzhmNQ', 'https://www.checkers.co.za/medias/10151458EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w1MTc0MnxpbWFnZS9wbmd8aW1hZ2VzL2g2MS9oYmEvODg1Nzg5Nzk5MjIyMi5wbmd8ZDFhNTlkMmJiNGY3ZDA3YjU5NjkwZGMxMjY3ZTgzMDVjYzFkMDkxNzI4NzlmN2U2MjhjZGJmYjE1NDg5ZDU2ZA', 'https://www.checkers.co.za/medias/10143000EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w0NjYyN3xpbWFnZS9wbmd8aW1hZ2VzL2g2MC9oZDIvODg1NzY3NDA4ODQ3OC5wbmd8ZDAzYzk5ZGFkY2Y1NjBiOTllZGJjOTVkZTUwNTg3NjBhYTM4NTk0OGFiYzk4OWRlNDQxZTdkNjQzNWM5YmU1NA', 'https://www.checkers.co.za/medias/10136298EAV2-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w1NzYzNXxpbWFnZS9wbmd8aW1hZ2VzL2g3MC9oZWEvOTA5NTI5NTY2NDE1OC5wbmd8NDVkZmY4YWU4NWY3MDliYjYyNTk4MWM1NzIyOTNlMjYwMzEwOGFiNGNiZTEzNGRhMmVkZjNiZTU0ZjNiYThiMw', 'https://www.checkers.co.za/medias/10151462EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w2MzY1MHxpbWFnZS9wbmd8aW1hZ2VzL2g4Zi9oMjcvODg1NzgxNDU2NDg5NC5wbmd8ZTE5ZmViZTdjNDNkOTVjMjZkODYwMjA4YTczNTgwNjU5ZmViMmE4OTQ4YzUwYjgwMjI0ZGJkNzJkNTI5OGU0Mg', 'https://www.checkers.co.za/medias/10241929EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w2MjMwOXxpbWFnZS9wbmd8aW1hZ2VzL2g0NS9oY2QvODg1ODQ2ODM4NDc5OC5wbmd8YzE1MGZkOWI2MjAyOWVlNzQ2YmRkMWM2MDNhZTk3ZGFkYWY4ZWMxNzA5Njc5OTMxNzY3OWEyNzg5MzczZmM1ZA', 'https://www.checkers.co.za/medias/10165121EAV2-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w2NTUzMXxpbWFnZS9wbmd8aW1hZ2VzL2gyZC9oYjcvODg2NDkyODM2NjYyMi5wbmd8NzEyODllNDlmZjE1NjJmMzEyMmU4MTU4NWQ4ZjRjYmM1Nzc2NWNjM2Y2YmFmZGQ1N2Y5ZjFmOTY0ZjBkMGE1OQ', 'https://www.checkers.co.za/medias/10145817EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3wyMzU0MHxpbWFnZS9wbmd8aW1hZ2VzL2gwMy9oZDIvODg1Nzc0ODc5OTUxOC5wbmd8NGQxZjA0OWNkZTVjY2JmNTI2ZTdlOGIwZjFiMmE5MGFhYjQ2NjZhMDBiYWMyNzVhYjMxNTI4YTZjYWU3OWZhZA', 'https://www.checkers.co.za/medias/10151065EAv2-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w2MDkyNHxpbWFnZS9wbmd8aW1hZ2VzL2g2Ny9oMDEvOTI3ODc2MDE4OTk4Mi5wbmd8YWQ1N2E4Y2ZmNTQ3YzA1ZDdmODcyZDlmZTg4ZGUwZGJhOWQ3ZGNiNWI5ZmE4OWFmOTVkZDgyYjEzOTUyZjlhYg', 'https://www.checkers.co.za/medias/10126789EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3wxMDYzMDh8aW1hZ2UvcG5nfGltYWdlcy9oNzcvaDAzLzg4NjA4NzYzMDg1MTAucG5nfDVhMTE1MmE5YmMxNjE0OGZmM2IwOTcxMWQzYWIyY2IxOTU2MmY1M2M2N2MzZjc5ZDE2YWFmNGFiZjdiOTI1YzY', 'https://www.checkers.co.za/medias/10241933EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w2NDQ4NXxpbWFnZS9wbmd8aW1hZ2VzL2gzNy9oMTkvODg1ODQwMjg4MTU2Ni5wbmd8NDA5MDRlMDZkM2U3M2JiODUwNWVmYmE5YzM3NjQ3NDkxYTMzZmI0ZjY3OWFkNDZiODU2YjQ2NTRjOTQyNjI2MQ', 'https://www.checkers.co.za/medias/10147193EAV2-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w2MDk2M3xpbWFnZS9wbmd8aW1hZ2VzL2g4Zi9oNTUvODk1NjcwODIyNTA1NC5wbmd8YmFkMTgzZDFkNGRiY2ViMzU4MjNhMGY0ZmM1OTgyY2U4NTY5MGZiZWI5ZTMzNjE1ZTNkY2Q1YzAwY2JhZTgyYw', 'https://www.checkers.co.za/medias/10164636EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w0MDM4NnxpbWFnZS9wbmd8aW1hZ2VzL2gxMS9oYWIvODg1ODA5NTU4MzI2Mi5wbmd8YTYxY2ZkNjAzOTg2NGVmNGMxODVjNmRkNTAyYmYzOWM2ZDU0MzgyYzk3YjM2YWUzOTRkMGEwOWE0NmVjMDQ0NA', 'https://www.checkers.co.za/medias/10136574EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w1MzU0NXxpbWFnZS9wbmd8aW1hZ2VzL2hjMi9oNjEvODg1NzQzMjk4MTUzNC5wbmd8YTJmZGU0ZGVjNDU1NTIyMzU0NGM1ZTQzOTQ4OTUwNmEwN2I3MDc5MjliOWNkNmRlNTgzZDMxYjdkMmNjNTIyYw', 'https://www.checkers.co.za/medias/10604301EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w1OTUxN3xpbWFnZS9wbmd8aW1hZ2VzL2hiZi9oZDQvODg1OTg2MzY0NjIzOC5wbmd8MTE5ZTQ4NzJkMzhjMzczMDk0MTE4YTZhZTllMTFlYjBiMTUyY2IzNjIyMmM4NzFlODA0MTU1Yzg3ZWNkMjMyNQ', 'https://www.checkers.co.za/medias/10145422EA-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w3OTc0MHxpbWFnZS9wbmd8aW1hZ2VzL2gxYy9oNTIvODk2MjM0ODIyMDQ0Ni5wbmd8MDc5NjY1YzY2NGE0NDFiNWRiN2NkNWZkMWJlODg5MDhlOWUyZWNhNDEzMWJiZTQ3MjM5MDYyZjgzZWYyYWM2Mg', 'https://www.checkers.co.za/medias/10136291EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w5NjM3NnxpbWFnZS9wbmd8aW1hZ2VzL2hjNi9oMzUvODg1NzQzMDU1NjcwMi5wbmd8ZDQwZjEzMmU5Y2JkMDhkNGM2MGQ4ZTc1MWY0Y2Q5YTJhZWI2YmM2YmY5YjNiYWEyZjQ0YWQ5ZDgyMmE3ZWE2YQ', 'https://www.checkers.co.za/medias/10148833EAV2-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w3NTU3NnxpbWFnZS9wbmd8aW1hZ2VzL2g1OC9oNzIvODk1NjY5NjQ2MTM0Mi5wbmd8N2YyZmMxMjA5ZjkxZjkzZWExN2E2MGE1ZTZiZjI0M2FkMDcxZTVlMzY0ZjAzOTRjMjAzNzRjYWQ5Yzk4NjZkNQ']