Puppeteer:仅为主页域添加基本身份验证 header,不适用于第 3 方请求

Puppeteer: add basic auth header for main page domain only, not for 3rd party requests

我正在尝试让人偶操纵者发送 Authorization header,而不接收质询,仅针对第一方/第二方请求 -即不给第 3 方,并且没有意想不到的后果。主要目标是在需要的地方进行身份验证,并避免泄露 Authorization + Referer

的杀手级组合

使用page.authenticate() 是行不通的,因为它需要挑战。使用 page.setExtraHTTPHeaders() 设置 header,然后将其发送给第三方。使用 page.setRequestInterception() 允许我引入一些条件逻辑,并确实解决了主要目标,但它似乎增加了一堆复杂性和意想不到的后果(例如围绕缓存)。

我的具体用例是围绕 webfonts,fwiw。

这是我如何确认额外的 header 已通过 page.setExtraHTTPHeaders 发送给第三方(在本例中为 httpbin)

为 httpbin 提供带有 iframe 的简单页面。org/headers:

var http = require('http')

http.createServer(function (request, response) {
    console.log(request.headers)
    response.writeHead(200)
    response.end('<iframe src="http://httpbin.org/headers" width="100%" height="100%"></iframe>\n')
}).listen(8000)

使用 puppeteer 获取该页面:

const puppeteer = require('puppeteer');
const url = 'http://localhost:8000';

(async () => {
  const browser = await puppeteer.launch()

  const page = await browser.newPage()

  await page.setExtraHTTPHeaders({ Authorization: 'Basic dXNlcjpwYXNz' })
  //await page.authenticate({ username: 'user', password: 'pass' })
  await page.goto(url)
  await page.screenshot({path: '/tmp/headers.png'})

  await browser.close()
})()

httpbin 的内容。org/headers 响应(使用 tcpflow -c 在网络上捕获):

 {
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", 
    "Accept-Encoding": "gzip, deflate", 
    "Accept-Language": "en-GB", 
    "Authorization": "Basic dXNlcjpwYXNz",  <----- Authorization is forwarded
    "Host": "httpbin.org", 
    "Referer": "http://localhost:8000/", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/83.0.4103.0 Safari/537.36", 
    "X-Amzn-Trace-Id": "Root=1-5ecdb903-0c61b77370a47d894aa8aa7c"
  }
}

你可以使用request.isNavigationRequest()方法过滤掉任何非主域限制的请求,当应用auth headers等时

在 GitHub puppeteer 项目上报告了类似的问题,导致添加了此方法,作者给出了这个示例 usage:

    // enable request interception
    await page.setRequestInterception(true);
    // add header for the navigation requests
    page.on('request', request => {
      // Do nothing in case of non-navigation requests.
      if (!request.isNavigationRequest()) {
        request.continue();
        return;
      }
      // Add a new header for navigation request.
      const headers = request.headers();
      headers['X-Just-Must-Be-Request-In-Main-Request'] = 1;
      request.continue({ headers });
    });
    // navigate to the website
    await page.goto('https://example.com');