如何使用 worker-threads 为 puppeteer 进行多线程处理以实现 web 自动化目的
how to go multithreaded for puppeteer using worker-threads for web-automation purpose
你好,我正在做一些网络自动化,我想打开 运行 puppeteer multithreaded 我的意思是打开同一个页面 10 次,我对我读到的工作线程的理解是最好的解决方案我猜?但我不知道如何正确使用它,我会放一个我所做的示例代码
const { Worker, isMainThread } = require('worker_threads');
const puppeteer = require('puppeteer') ;
let scrapt = async()=>{
/* -------------------------------------------------------------------------- */
/* Launching puppeteer */
/* -------------------------------------------------------------------------- */
try{
const browser = await puppeteer.launch({headless: true }) ;
const page = await browser.newPage();
await page.setUserAgent(
`Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36`
);
let Browser_b = new Date()
await page.goto('https://www.supremenewyork.com/')
let browser_e = new Date()
console.log(browser_e - Browser_b)
}
catch(e){
console.log(e)
}
let ex = [1,2,3,4]
if (isMainThread) {
// This re-loads the current file inside a Worker instance.asdasd
new Worker(__filename);
} else {
for(let val of ex) {
scrapt();
}
}
这个脚本打开了 4 个浏览器,但是如果我打开更多的 PC 滞后 很多 因为我认为它只使用一个线程而不是全部使用它们?
提前谢谢你,对不起我的愚蠢
曾经尝试过使用集群吗?这是 multi_processing 的好方法并且比 worker_threads 更容易使用在我看来这是来自 HERE
的示例
const cluster = require('cluster');
const http = require('http');
const numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
console.log(`Master ${process.pid} is running`);
// Fork workers.
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', (worker, code, signal) => {
console.log(`worker ${worker.process.pid} died`);
});
} else {
// Workers can share any TCP connection
// In this case it is an HTTP server
http.createServer((req, res) => {
res.writeHead(200);
res.end('hello world\n');
}).listen(8000);
console.log(`Worker ${process.pid} started`);
}
你好,我正在做一些网络自动化,我想打开 运行 puppeteer multithreaded 我的意思是打开同一个页面 10 次,我对我读到的工作线程的理解是最好的解决方案我猜?但我不知道如何正确使用它,我会放一个我所做的示例代码
const { Worker, isMainThread } = require('worker_threads');
const puppeteer = require('puppeteer') ;
let scrapt = async()=>{
/* -------------------------------------------------------------------------- */
/* Launching puppeteer */
/* -------------------------------------------------------------------------- */
try{
const browser = await puppeteer.launch({headless: true }) ;
const page = await browser.newPage();
await page.setUserAgent(
`Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36`
);
let Browser_b = new Date()
await page.goto('https://www.supremenewyork.com/')
let browser_e = new Date()
console.log(browser_e - Browser_b)
}
catch(e){
console.log(e)
}
let ex = [1,2,3,4]
if (isMainThread) {
// This re-loads the current file inside a Worker instance.asdasd
new Worker(__filename);
} else {
for(let val of ex) {
scrapt();
}
}
这个脚本打开了 4 个浏览器,但是如果我打开更多的 PC 滞后 很多 因为我认为它只使用一个线程而不是全部使用它们? 提前谢谢你,对不起我的愚蠢
曾经尝试过使用集群吗?这是 multi_processing 的好方法并且比 worker_threads 更容易使用在我看来这是来自 HERE
的示例const cluster = require('cluster');
const http = require('http');
const numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
console.log(`Master ${process.pid} is running`);
// Fork workers.
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', (worker, code, signal) => {
console.log(`worker ${worker.process.pid} died`);
});
} else {
// Workers can share any TCP connection
// In this case it is an HTTP server
http.createServer((req, res) => {
res.writeHead(200);
res.end('hello world\n');
}).listen(8000);
console.log(`Worker ${process.pid} started`);
}