在 Heroku 上使用 puppeteer 无头 chrome 绕过 Cloudflare 的验证码

Bypass Cloudflare's captcha with headless chrome using puppeteer on Heroku

我正在尝试使用 Heroku 上的 puppeteer 访问无头 chrome 的站点。当我在我的机器上本地尝试时,我的设置有效,但是当尝试将它安装在 Heroku 上时,我得到如下信息:

我知道 puppeteer 默认启用了 javascript 并且 I've read 它看起来与此无关。

我正在使用 puppeteer-extra-plugin-stealth, random-useragent 和视口随机化,但似乎没有任何效果。

当 运行 在本地与在 Heroku 上时,木偶操纵者 and/or chrome 会添加额外的东西吗?

这是我的设置:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
const randomUseragent = require('random-useragent');

const USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36';


let browser = await puppeteer.launch(
  { headless: true, executablePath: process.env.CHROME_BIN || null, args: [
    '--enable-features=NetworkService', '--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'
  ], ignoreHTTPSErrors: true, dumpio: false}
);
let page = await browser.newPage();
const userAgent = randomUseragent.getRandom();
const UA = userAgent || USER_AGENT;

//Randomize viewport size
await page.setViewport({
    width: 1920 + Math.floor(Math.random() * 100),
    height: 3000 + Math.floor(Math.random() * 100),
    deviceScaleFactor: 1,
    hasTouch: false,
    isLandscape: false,
    isMobile: false,
});

await page.setUserAgent(UA);
await page.setJavaScriptEnabled(true);
await page.setDefaultNavigationTimeout(0);
await page.goto('https://external.site.example', { waitUntil: 'networkidle0' });

...

根据 Raphael PICCOLO 关于如何检测 IP 地址的评论,我设法解决了我的问题。我的机器或 Heroku 没有添加或删除任何额外内容,只是 IP。

我使用了需要 proxy-chain 的代理以避免出现 net::ERR_NO_SUPPORTED_PROXIES 错误。

我的代码最终是这样的:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
const randomUseragent = require('random-useragent');
const proxyChain = require('proxy-chain');

const USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36';

const oldProxyUrl = process.env.PROXY_SERVER;
const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);

let browser = await puppeteer.launch(
  { headless: true, executablePath: process.env.CHROME_BIN || null, args: [
    '--no-sandbox', '--disable-setuid-sandbox', `--proxy-server=${newProxyUrl}`
  ], ignoreHTTPSErrors: true, dumpio: false}
);
let page = await browser.newPage();
const userAgent = randomUseragent.getRandom();
const UA = userAgent || USER_AGENT;

//Randomize viewport size
await page.setViewport({
    width: 1920 + Math.floor(Math.random() * 100),
    height: 3000 + Math.floor(Math.random() * 100),
    deviceScaleFactor: 1,
    hasTouch: false,
    isLandscape: false,
    isMobile: false,
});

await page.setUserAgent(UA);
await page.setJavaScriptEnabled(true);
await page.setDefaultNavigationTimeout(0);
await page.goto('https://external.site.example', { waitUntil: 'networkidle0' });

...