Puppeteer:仅为主页域添加基本身份验证 header,不适用于第 3 方请求
Puppeteer: add basic auth header for main page domain only, not for 3rd party requests
我正在尝试让人偶操纵者发送 Authorization
header,而不接收质询,仅针对第一方/第二方请求 -即不给第 3 方,并且没有意想不到的后果。主要目标是在需要的地方进行身份验证,并避免泄露 Authorization
+ Referer
的杀手级组合
使用page.authenticate()
是行不通的,因为它需要挑战。使用 page.setExtraHTTPHeaders()
设置 header,然后将其发送给第三方。使用 page.setRequestInterception()
允许我引入一些条件逻辑,并确实解决了主要目标,但它似乎增加了一堆复杂性和意想不到的后果(例如围绕缓存)。
我的具体用例是围绕 webfonts,fwiw。
这是我如何确认额外的 header 已通过 page.setExtraHTTPHeaders
发送给第三方(在本例中为 httpbin)
为 httpbin 提供带有 iframe 的简单页面。org/headers:
var http = require('http')
http.createServer(function (request, response) {
console.log(request.headers)
response.writeHead(200)
response.end('<iframe src="http://httpbin.org/headers" width="100%" height="100%"></iframe>\n')
}).listen(8000)
使用 puppeteer 获取该页面:
const puppeteer = require('puppeteer');
const url = 'http://localhost:8000';
(async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.setExtraHTTPHeaders({ Authorization: 'Basic dXNlcjpwYXNz' })
//await page.authenticate({ username: 'user', password: 'pass' })
await page.goto(url)
await page.screenshot({path: '/tmp/headers.png'})
await browser.close()
})()
httpbin 的内容。org/headers 响应(使用 tcpflow -c
在网络上捕获):
{
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "en-GB",
"Authorization": "Basic dXNlcjpwYXNz", <----- Authorization is forwarded
"Host": "httpbin.org",
"Referer": "http://localhost:8000/",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/83.0.4103.0 Safari/537.36",
"X-Amzn-Trace-Id": "Root=1-5ecdb903-0c61b77370a47d894aa8aa7c"
}
}
你可以使用request.isNavigationRequest()
方法过滤掉任何非主域限制的请求,当应用auth headers等时
在 GitHub puppeteer 项目上报告了类似的问题,导致添加了此方法,作者给出了这个示例 usage:
// enable request interception
await page.setRequestInterception(true);
// add header for the navigation requests
page.on('request', request => {
// Do nothing in case of non-navigation requests.
if (!request.isNavigationRequest()) {
request.continue();
return;
}
// Add a new header for navigation request.
const headers = request.headers();
headers['X-Just-Must-Be-Request-In-Main-Request'] = 1;
request.continue({ headers });
});
// navigate to the website
await page.goto('https://example.com');
我正在尝试让人偶操纵者发送 Authorization
header,而不接收质询,仅针对第一方/第二方请求 -即不给第 3 方,并且没有意想不到的后果。主要目标是在需要的地方进行身份验证,并避免泄露 Authorization
+ Referer
使用page.authenticate()
是行不通的,因为它需要挑战。使用 page.setExtraHTTPHeaders()
设置 header,然后将其发送给第三方。使用 page.setRequestInterception()
允许我引入一些条件逻辑,并确实解决了主要目标,但它似乎增加了一堆复杂性和意想不到的后果(例如围绕缓存)。
我的具体用例是围绕 webfonts,fwiw。
这是我如何确认额外的 header 已通过 page.setExtraHTTPHeaders
发送给第三方(在本例中为 httpbin)
为 httpbin 提供带有 iframe 的简单页面。org/headers:
var http = require('http')
http.createServer(function (request, response) {
console.log(request.headers)
response.writeHead(200)
response.end('<iframe src="http://httpbin.org/headers" width="100%" height="100%"></iframe>\n')
}).listen(8000)
使用 puppeteer 获取该页面:
const puppeteer = require('puppeteer');
const url = 'http://localhost:8000';
(async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.setExtraHTTPHeaders({ Authorization: 'Basic dXNlcjpwYXNz' })
//await page.authenticate({ username: 'user', password: 'pass' })
await page.goto(url)
await page.screenshot({path: '/tmp/headers.png'})
await browser.close()
})()
httpbin 的内容。org/headers 响应(使用 tcpflow -c
在网络上捕获):
{
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "en-GB",
"Authorization": "Basic dXNlcjpwYXNz", <----- Authorization is forwarded
"Host": "httpbin.org",
"Referer": "http://localhost:8000/",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/83.0.4103.0 Safari/537.36",
"X-Amzn-Trace-Id": "Root=1-5ecdb903-0c61b77370a47d894aa8aa7c"
}
}
你可以使用request.isNavigationRequest()
方法过滤掉任何非主域限制的请求,当应用auth headers等时
在 GitHub puppeteer 项目上报告了类似的问题,导致添加了此方法,作者给出了这个示例 usage:
// enable request interception
await page.setRequestInterception(true);
// add header for the navigation requests
page.on('request', request => {
// Do nothing in case of non-navigation requests.
if (!request.isNavigationRequest()) {
request.continue();
return;
}
// Add a new header for navigation request.
const headers = request.headers();
headers['X-Just-Must-Be-Request-In-Main-Request'] = 1;
request.continue({ headers });
});
// navigate to the website
await page.goto('https://example.com');