npm urllib ResponseTimeoutError - 如何增加超时?
npm urllib ResponseTimeoutError - how to increase timeout?
我正在尝试使用 urllib 转到许多不同的 URL 以获取一些 HTML 并解析它。我有一个循环,应该使用 urllib 进行大约 5000 次迭代:
urllib.request('a url here', options=[timeout=50000]).then(function (result) {
// data is Buffer instance
var $ = cheerio.load(result.data);
$('dt').each(function () {
var news_html = cheerio.load($(this).html());
if (news_html('span.timestamp').html() != null) {
var date = news_html('span.timestamp').html();
var description = news_html('.story_title').html();
var link = news_html('a').attr('href');
var post = {description: description, date: date, link: link};
pool.query('INSERT INTO db SET ?', post, function (err, result) {
if (err) {
console.log(err);
}
});
}
});
}).catch(function (err) {
console.error(err);
});
大约 100 次迭代后,出现此错误:
{ ResponseTimeoutError: Response timeout for 5000ms, GET http://www.streetinsider.com/stock_lookup_news.php?q=CREG&type=major_news -1 (connected: true, keepalive socket: false)
headers: {}
at Timeout._onTimeout (/Users/max/projects/stock-news-angular/node_modules/urllib/lib/urllib.js:715:15)
at tryOnTimeout (timers.js:232:11)
at Timer.listOnTimeout (timers.js:202:5)
name: 'ResponseTimeoutError',
requestId: 595,
data: undefined,
path: '/stock_lookup_news.php?q=CREG&type=major_news',
status: -1,
headers: {},
res:
{ status: -1,
statusCode: -1,
headers: {},
size: 0,
aborted: false,
rt: 10030,
keepAliveSocket: false,
data: undefined,
requestUrls: [ 'http://www.streetinsider.com/stock_lookup_news.php?q=CREG&type=major_news' ],
timing: null,
remoteAddress: '162.242.133.50',
remotePort: 80 } }
我怎样才能增加超时,以便我可以完成循环并将所有需要的数据插入我的 MySQL 数据库?我想我没有正确理解如何设置超时,因为 npm 的 urllib 确实有一个选项来设置它。
我认为为 request() 函数添加 timeout
选项参数可以解决您的问题。
在 API 文档中:
timeout Number | Array - Request timeout in milliseconds for connecting phase and response receiving phase. Defaults to exports.TIMEOUT, both are 5s. You can use timeout: 5000 to tell urllib use same timeout on two phase or set them seperately such as timeout: [3000, 5000], which will set connecting timeout to 3s and response 5s.
我正在尝试使用 urllib 转到许多不同的 URL 以获取一些 HTML 并解析它。我有一个循环,应该使用 urllib 进行大约 5000 次迭代:
urllib.request('a url here', options=[timeout=50000]).then(function (result) {
// data is Buffer instance
var $ = cheerio.load(result.data);
$('dt').each(function () {
var news_html = cheerio.load($(this).html());
if (news_html('span.timestamp').html() != null) {
var date = news_html('span.timestamp').html();
var description = news_html('.story_title').html();
var link = news_html('a').attr('href');
var post = {description: description, date: date, link: link};
pool.query('INSERT INTO db SET ?', post, function (err, result) {
if (err) {
console.log(err);
}
});
}
});
}).catch(function (err) {
console.error(err);
});
大约 100 次迭代后,出现此错误:
{ ResponseTimeoutError: Response timeout for 5000ms, GET http://www.streetinsider.com/stock_lookup_news.php?q=CREG&type=major_news -1 (connected: true, keepalive socket: false)
headers: {}
at Timeout._onTimeout (/Users/max/projects/stock-news-angular/node_modules/urllib/lib/urllib.js:715:15)
at tryOnTimeout (timers.js:232:11)
at Timer.listOnTimeout (timers.js:202:5)
name: 'ResponseTimeoutError',
requestId: 595,
data: undefined,
path: '/stock_lookup_news.php?q=CREG&type=major_news',
status: -1,
headers: {},
res:
{ status: -1,
statusCode: -1,
headers: {},
size: 0,
aborted: false,
rt: 10030,
keepAliveSocket: false,
data: undefined,
requestUrls: [ 'http://www.streetinsider.com/stock_lookup_news.php?q=CREG&type=major_news' ],
timing: null,
remoteAddress: '162.242.133.50',
remotePort: 80 } }
我怎样才能增加超时,以便我可以完成循环并将所有需要的数据插入我的 MySQL 数据库?我想我没有正确理解如何设置超时,因为 npm 的 urllib 确实有一个选项来设置它。
我认为为 request() 函数添加 timeout
选项参数可以解决您的问题。
在 API 文档中:
timeout Number | Array - Request timeout in milliseconds for connecting phase and response receiving phase. Defaults to exports.TIMEOUT, both are 5s. You can use timeout: 5000 to tell urllib use same timeout on two phase or set them seperately such as timeout: [3000, 5000], which will set connecting timeout to 3s and response 5s.