处理 WWW::Mechanize 中的 GET 错误
Handling GET errors in WWW::Mechanize
我正在使用一个脚本,该脚本使用 WWW::Mechanize 从网站上抓取数据并且一切正常,除了网站本身。有时它只是暂时没有响应,对于给定的 my $url = 'http://www.somesite.com/more/url/text'
我会在 $mech->get($url)
:
上出现此错误
Error GETing http://www.somesite.com/more/url/text: Can't connect to www.somesite.com:443 at ./trackSomesite.pl line 34.
这个错误偶尔会发生,没有可识别的模式,根据我处理网站的经验,这是因为服务器不稳定。
我希望能够明确知道发生了这个错误,而不是像 Too many requests
这样的其他错误。
我的问题是如何让我的脚本处理这个错误而不死掉?
将您的 $mech->get(...)
请求包装在 eval 块中或使用 autocheck => 0
,然后检查 $mech->status
代码 and/or $mech->status_line
来决定要做什么.
这是一个例子:
#!/usr/bin/env perl
use WWW::Mechanize;
use constant RETRY_MAX => 5;
my $url = 'http://www.xxsomesite.com/more/url/text'; # Cannot connect
my $mech = WWW::Mechanize->new( autocheck => 0 );
my $content = fetch($url);
sub fetch {
my ($url) = @_;
for my $retry (0 .. RETRY_MAX-1) {
my $message = "Attempting to fetch [ $url ]";
$message .= $retry ? " - retry $retry\n" : "\n";
warn $message;
my $response = $mech->get($url);
return $response->content() if $response->is_success();
my $status = $response->status;
warn "status = $status\n";
if ($response->status_line =~ /Can['']t connect/) {
$retry++;
warn "cannot connect...will retry after $retry seconds\n";
sleep $retry;
} elsif ($status == 429) {
warn "too many requests...ignoring\n";
return undef;
} else {
warn "something else...\n";
return undef;
}
}
warn "giving up...\n";
return undef;
}
输出
Attempting to fetch [ http://www.xxsomesite.com/more/url/text ]
status = 500
cannot connect...will retry after 1 seconds
Attempting to fetch [ http://www.xxsomesite.com/more/url/text ] - retry 1
status = 500
cannot connect...will retry after 2 seconds
Attempting to fetch [ http://www.xxsomesite.com/more/url/text ] - retry 2
status = 500
cannot connect...will retry after 3 seconds
Attempting to fetch [ http://www.xxsomesite.com/more/url/text ] - retry 3
status = 500
cannot connect...will retry after 4 seconds
Attempting to fetch [ http://www.xxsomesite.com/more/url/text ] - retry 4
status = 500
cannot connect...will retry after 5 seconds
giving up...
我正在使用一个脚本,该脚本使用 WWW::Mechanize 从网站上抓取数据并且一切正常,除了网站本身。有时它只是暂时没有响应,对于给定的 my $url = 'http://www.somesite.com/more/url/text'
我会在 $mech->get($url)
:
Error GETing http://www.somesite.com/more/url/text: Can't connect to www.somesite.com:443 at ./trackSomesite.pl line 34.
这个错误偶尔会发生,没有可识别的模式,根据我处理网站的经验,这是因为服务器不稳定。
我希望能够明确知道发生了这个错误,而不是像 Too many requests
这样的其他错误。
我的问题是如何让我的脚本处理这个错误而不死掉?
将您的 $mech->get(...)
请求包装在 eval 块中或使用 autocheck => 0
,然后检查 $mech->status
代码 and/or $mech->status_line
来决定要做什么.
这是一个例子:
#!/usr/bin/env perl
use WWW::Mechanize;
use constant RETRY_MAX => 5;
my $url = 'http://www.xxsomesite.com/more/url/text'; # Cannot connect
my $mech = WWW::Mechanize->new( autocheck => 0 );
my $content = fetch($url);
sub fetch {
my ($url) = @_;
for my $retry (0 .. RETRY_MAX-1) {
my $message = "Attempting to fetch [ $url ]";
$message .= $retry ? " - retry $retry\n" : "\n";
warn $message;
my $response = $mech->get($url);
return $response->content() if $response->is_success();
my $status = $response->status;
warn "status = $status\n";
if ($response->status_line =~ /Can['']t connect/) {
$retry++;
warn "cannot connect...will retry after $retry seconds\n";
sleep $retry;
} elsif ($status == 429) {
warn "too many requests...ignoring\n";
return undef;
} else {
warn "something else...\n";
return undef;
}
}
warn "giving up...\n";
return undef;
}
输出
Attempting to fetch [ http://www.xxsomesite.com/more/url/text ]
status = 500
cannot connect...will retry after 1 seconds
Attempting to fetch [ http://www.xxsomesite.com/more/url/text ] - retry 1
status = 500
cannot connect...will retry after 2 seconds
Attempting to fetch [ http://www.xxsomesite.com/more/url/text ] - retry 2
status = 500
cannot connect...will retry after 3 seconds
Attempting to fetch [ http://www.xxsomesite.com/more/url/text ] - retry 3
status = 500
cannot connect...will retry after 4 seconds
Attempting to fetch [ http://www.xxsomesite.com/more/url/text ] - retry 4
status = 500
cannot connect...will retry after 5 seconds
giving up...