在 Heroku 上使用 puppeteer 无头 chrome 绕过 Cloudflare 的验证码
Bypass Cloudflare's captcha with headless chrome using puppeteer on Heroku
我正在尝试使用 Heroku 上的 puppeteer 访问无头 chrome 的站点。当我在我的机器上本地尝试时,我的设置有效,但是当尝试将它安装在 Heroku 上时,我得到如下信息:
我知道 puppeteer 默认启用了 javascript 并且 I've read 它看起来与此无关。
我正在使用 puppeteer-extra-plugin-stealth, random-useragent 和视口随机化,但似乎没有任何效果。
当 运行 在本地与在 Heroku 上时,木偶操纵者 and/or chrome 会添加额外的东西吗?
这是我的设置:
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
const randomUseragent = require('random-useragent');
const USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36';
let browser = await puppeteer.launch(
{ headless: true, executablePath: process.env.CHROME_BIN || null, args: [
'--enable-features=NetworkService', '--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'
], ignoreHTTPSErrors: true, dumpio: false}
);
let page = await browser.newPage();
const userAgent = randomUseragent.getRandom();
const UA = userAgent || USER_AGENT;
//Randomize viewport size
await page.setViewport({
width: 1920 + Math.floor(Math.random() * 100),
height: 3000 + Math.floor(Math.random() * 100),
deviceScaleFactor: 1,
hasTouch: false,
isLandscape: false,
isMobile: false,
});
await page.setUserAgent(UA);
await page.setJavaScriptEnabled(true);
await page.setDefaultNavigationTimeout(0);
await page.goto('https://external.site.example', { waitUntil: 'networkidle0' });
...
根据 Raphael PICCOLO 关于如何检测 IP 地址的评论,我设法解决了我的问题。我的机器或 Heroku 没有添加或删除任何额外内容,只是 IP。
我使用了需要 proxy-chain 的代理以避免出现 net::ERR_NO_SUPPORTED_PROXIES
错误。
我的代码最终是这样的:
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
const randomUseragent = require('random-useragent');
const proxyChain = require('proxy-chain');
const USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36';
const oldProxyUrl = process.env.PROXY_SERVER;
const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);
let browser = await puppeteer.launch(
{ headless: true, executablePath: process.env.CHROME_BIN || null, args: [
'--no-sandbox', '--disable-setuid-sandbox', `--proxy-server=${newProxyUrl}`
], ignoreHTTPSErrors: true, dumpio: false}
);
let page = await browser.newPage();
const userAgent = randomUseragent.getRandom();
const UA = userAgent || USER_AGENT;
//Randomize viewport size
await page.setViewport({
width: 1920 + Math.floor(Math.random() * 100),
height: 3000 + Math.floor(Math.random() * 100),
deviceScaleFactor: 1,
hasTouch: false,
isLandscape: false,
isMobile: false,
});
await page.setUserAgent(UA);
await page.setJavaScriptEnabled(true);
await page.setDefaultNavigationTimeout(0);
await page.goto('https://external.site.example', { waitUntil: 'networkidle0' });
...
我正在尝试使用 Heroku 上的 puppeteer 访问无头 chrome 的站点。当我在我的机器上本地尝试时,我的设置有效,但是当尝试将它安装在 Heroku 上时,我得到如下信息:
我知道 puppeteer 默认启用了 javascript 并且 I've read 它看起来与此无关。
我正在使用 puppeteer-extra-plugin-stealth, random-useragent 和视口随机化,但似乎没有任何效果。
当 运行 在本地与在 Heroku 上时,木偶操纵者 and/or chrome 会添加额外的东西吗?
这是我的设置:
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
const randomUseragent = require('random-useragent');
const USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36';
let browser = await puppeteer.launch(
{ headless: true, executablePath: process.env.CHROME_BIN || null, args: [
'--enable-features=NetworkService', '--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'
], ignoreHTTPSErrors: true, dumpio: false}
);
let page = await browser.newPage();
const userAgent = randomUseragent.getRandom();
const UA = userAgent || USER_AGENT;
//Randomize viewport size
await page.setViewport({
width: 1920 + Math.floor(Math.random() * 100),
height: 3000 + Math.floor(Math.random() * 100),
deviceScaleFactor: 1,
hasTouch: false,
isLandscape: false,
isMobile: false,
});
await page.setUserAgent(UA);
await page.setJavaScriptEnabled(true);
await page.setDefaultNavigationTimeout(0);
await page.goto('https://external.site.example', { waitUntil: 'networkidle0' });
...
根据 Raphael PICCOLO 关于如何检测 IP 地址的评论,我设法解决了我的问题。我的机器或 Heroku 没有添加或删除任何额外内容,只是 IP。
我使用了需要 proxy-chain 的代理以避免出现 net::ERR_NO_SUPPORTED_PROXIES
错误。
我的代码最终是这样的:
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
const randomUseragent = require('random-useragent');
const proxyChain = require('proxy-chain');
const USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36';
const oldProxyUrl = process.env.PROXY_SERVER;
const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);
let browser = await puppeteer.launch(
{ headless: true, executablePath: process.env.CHROME_BIN || null, args: [
'--no-sandbox', '--disable-setuid-sandbox', `--proxy-server=${newProxyUrl}`
], ignoreHTTPSErrors: true, dumpio: false}
);
let page = await browser.newPage();
const userAgent = randomUseragent.getRandom();
const UA = userAgent || USER_AGENT;
//Randomize viewport size
await page.setViewport({
width: 1920 + Math.floor(Math.random() * 100),
height: 3000 + Math.floor(Math.random() * 100),
deviceScaleFactor: 1,
hasTouch: false,
isLandscape: false,
isMobile: false,
});
await page.setUserAgent(UA);
await page.setJavaScriptEnabled(true);
await page.setDefaultNavigationTimeout(0);
await page.goto('https://external.site.example', { waitUntil: 'networkidle0' });
...