为什么可以通过浏览器访问一个站点，写简单的代码来获取，但是得到403错误

Question

终点：https://quizlet.com/webapi/3.2/images/search?query=hello&perPage=2

你们可以尝试以 Incognito 的身份访问此页面，从我这边可以。所以我想我可以从该站点获取数据。

我尝试在 Javascirpt 中复制请求和运行，Python。但是，它不起作用。我收到 403 错误。

我也尝试用Burp Suite。我无法通过 Burp 的浏览器访问此站点。

此外，由于我尝试使用 incognito，所以我认为它与 cookie 无关。

代码示例（JS）：

import fetch from "node-fetch";

const response = await fetch(
  "https://quizlet.com/webapi/3.2/images/search?query=hello&perPage=2",
  {
    headers: {
      accept:
        "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
      "accept-language": "en",
      "cache-control": "no-cache",
      pragma: "no-cache",
      "sec-ch-ua":
        '"Google Chrome";v="93", " Not;A Brand";v="99", "Chromium";v="93"',
      "sec-ch-ua-mobile": "?0",
      "sec-ch-ua-platform": '"Linux"',
      "sec-fetch-dest": "document",
      "sec-fetch-mode": "navigate",
      "sec-fetch-site": "none",
      "sec-fetch-user": "?1",
      "upgrade-insecure-requests": "1",
    },
    referrerPolicy: "strict-origin-when-cross-origin",
    body: null,
    method: "GET",
    mode: "cors",
    credentials: "include",
  }
);

const data = await response.status;
console.log(data);

代码Python

import requests

headers = {
    'authority': 'quizlet.com',
    'pragma': 'no-cache',
    'cache-control': 'no-cache',
    'sec-ch-ua': '"Google Chrome";v="93", " Not;A Brand";v="99", "Chromium";v="93"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Linux"',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'sec-fetch-site': 'none',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-user': '?1',
    'sec-fetch-dest': 'document',
    'accept-language': 'en',
    'cookie': 'qi5=i2x3g7y1z9a6%3At3vMoQQig2yLcpN.HKWn; qtkn=7gT4DE7pN9URJ2AFDYeaVe; fs=qzkse0; app_session_id=9781a407-4f37-4c09-8e97-8156f182bb45; search_session=%7B%22search_session_id%22%3A%22-2379864199063990974614477b859794%22%2C%22query%22%3A%22overrated%22%2C%22version%22%3A%221.1.1%22%2C%22platform%22%3A%22WEB%22%2C%22depth%22%3Anull%2C%22target_object_type%22%3A%22QImage%22%7D; __cf_bm=cB7hRf6JbcOFZ2kvQ3W12V4bxXiIgn_kF3n87RcI0h0-1631877048-0-Ac+Hi0pATLgW5N3JjqYa7uc5W4ZfDLOumvmCQixWJIKdcVj7stciFh8cYFVTOpr+q5pM2Q7LrXC/LsffOB6Mh2E=; __cfruid=81f16a673e6117331dd4270b3f4f29111590d7d8-1631877048',
}

params = (
    ('query', 'hello'),
    ('perPage', '2'),
)

response = requests.get(
    'https://quizlet.com/webapi/3.2/images/search', headers=headers, params=params)

# NB. Original query string below. It seems impossible to parse and
# reproduce query strings 100% accurately so the one below is given
# in case the reproduced version is not "correct".
# response = requests.get('https://quizlet.com/webapi/3.2/images/search?query=hello&perPage=2', headers=headers)


print(response.status_code)

请帮帮我。我什至不知道怎么会这样？（浏览器工作，而代码不工作）。还是谢谢了。

Answer 1

从 python 那边。我出于兴趣看了一下，因为我目前正在开发 REST API 并且很好奇他们是如何保护它的。

使用 Wireshark 时，python 中的“请求”模块似乎无法以与 Chrome/Firefox 相同的方式处理 http 请求，我怀疑他们正在使用它作为提供验证码的信号.

无论如何切换对 httpx 模块的请求；

pip install httpx

https://www.python-httpx.org/

并更改 headers 以完全复制 Firefox；

import httpx

headers = [

    ('Accept','text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'),
    ('Accept-Encoding','gzip, deflate, br'),
    ('Accept-Language','en-GB,en;q=0.5'),
    ('Cache-Control','max-age=0'),
    ('Connection','keep-alive'),
    ('Host','quizlet.com'),
    ('Sec-Fetch-Dest','document'),
    ('Sec-Fetch-Mode','navigate'),
    ('Sec-Fetch-Site','none'),
    ('Sec-Fetch-User','?1'),
    ('TE','trailers'),
    ('Upgrade-Insecure-Requests','1'),
    ('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0'),

]

params = (
    ('query', 'hello'),
    ('perPage', '2'),
)

response = httpx.get('https://quizlet.com/webapi/3.2/images/search', headers=headers, params=params,)

print(response.content)

为我提供以下与验证码页面相关的内容；

{
    "responses": [{
        "models": {
            "image": [{
                "id": 18957872,
                "personId": 16641862,
                "timestamp": 1416579222,
                "lastModified": 1416579222,
                "code": "Gfg5XS88MRmYq8RS",
                "license": 1,
                "width": 480,
                "height": 360,
                "flickrId": null,
                "flickrOwner": null,
                "_legacyUrl": "http://o.quizlet.com/cZDE.6rHW7IrGptXSGm8FA.gif",
                "_legacyUrlSquare": "http://o.quizlet.com/cZDE.6rHW7IrGptXSGm8FA_s.gif",
                "_legacyUrlSmall": "http://o.quizlet.com/cZDE.6rHW7IrGptXSGm8FA_m.gif",
                "_secureLegacyUrl": "https://o.quizlet.com/cZDE.6rHW7IrGptXSGm8FA.gif",
                "_secureLegacyUrlLarge": "https://o.quizlet.com/cZDE.6rHW7IrGptXSGm8FA_b.gif",
                "_secureLegacyUrlSquare": "https://o.quizlet.com/cZDE.6rHW7IrGptXSGm8FA_s.gif",
                "_secureLegacyUrlSmall": "https://o.quizlet.com/cZDE.6rHW7IrGptXSGm8FA_m.gif"
            }, {
                "id": 9228314,
                "personId": 513525,
                "timestamp": 1406222781,
                "lastModified": 1406222781,
                "code": "bPHbzaV7KsGWfuXJ",
                "license": 1,
                "width": 298,
                "height": 232,
                "flickrId": null,
                "flickrOwner": null,
                "_legacyUrl": "http://o.quizlet.com/ptqCa7LsKjiVSBVPI3OfTA.jpg",
                "_legacyUrlSquare": "http://o.quizlet.com/ptqCa7LsKjiVSBVPI3OfTA_s.jpg",
                "_legacyUrlSmall": "http://o.quizlet.com/ptqCa7LsKjiVSBVPI3OfTA_m.jpg",
                "_secureLegacyUrl": "https://o.quizlet.com/ptqCa7LsKjiVSBVPI3OfTA.jpg",
                "_secureLegacyUrlLarge": "https://o.quizlet.com/ptqCa7LsKjiVSBVPI3OfTA_b.jpg",
                "_secureLegacyUrlSquare": "https://o.quizlet.com/ptqCa7LsKjiVSBVPI3OfTA_s.jpg",
                "_secureLegacyUrlSmall": "https://o.quizlet.com/ptqCa7LsKjiVSBVPI3OfTA_m.jpg"
            }]
        },
        "paging": {
            "total": 50,
            "page": 1,
            "perPage": 2,
            "token": "UuKKKAkmxv.r4YtwFDuRevZVGAHr"
        }
    }]
}

为什么可以通过浏览器访问一个站点，写简单的代码来获取，但是得到403错误

Why can access a site through browser, wheares write simple code to fetch, but get 403 error

javascript

python

https

http

request