Perl LWP::UserAgent 模拟浏览器
Perl LWP::UserAgent simulate browser
我正在尝试使用 LWP::UserAgent 自动获取网页,但我收到 403 Forbidden 错误,而如果我从控制台 wget https://dreaminislam.com/a/ 或 curl 使用,我的页面正常。如何为 LWP::UserAgent 设置正确的选项以获取该页面并类似于模拟真实浏览器。这是示例代码。
use HTTP::CookieJar::LWP ();
use LWP::UserAgent;
use LWP::Simple;
my $url = qq{https://dreaminislam.com/a/};
my $content = getUrl($url);
exit;
sub getUrl {
my $url = shift;
my $jar = HTTP::CookieJar::LWP->new;
my $ua = LWP::UserAgent->new(timeout => 180, cookie_jar => $jar, protocols_allowed => ['http', 'https']);
$ua->agent(qq{Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0});
my $response = $ua->get($url);
if ($response->is_success) {
my $content = $response->decoded_content;
return $content;
} else {
my $content = $response->decoded_content;
printf "Get url error [%d] %s.\n", $response->code, $response->message;
}
}
该站点似乎安装了一些 Anti-Bot 保护。它似乎至少需要一个 User-Agent
和一个 Accept
header:
use LWP::UserAgent;
use HTTP::Request;
my $ua = LWP::UserAgent->new();
my $req = HTTP::Request->new(GET => 'https://dreaminislam.com/a/');
$req->header('User-Agent' => 'Mozilla/5.0');
$req->header('Accept' => '*/*');
my $response = $ua->request($req);
die $response->code if ! $response->is_success;
print $response->decoded_content;
我正在尝试使用 LWP::UserAgent 自动获取网页,但我收到 403 Forbidden 错误,而如果我从控制台 wget https://dreaminislam.com/a/ 或 curl 使用,我的页面正常。如何为 LWP::UserAgent 设置正确的选项以获取该页面并类似于模拟真实浏览器。这是示例代码。
use HTTP::CookieJar::LWP ();
use LWP::UserAgent;
use LWP::Simple;
my $url = qq{https://dreaminislam.com/a/};
my $content = getUrl($url);
exit;
sub getUrl {
my $url = shift;
my $jar = HTTP::CookieJar::LWP->new;
my $ua = LWP::UserAgent->new(timeout => 180, cookie_jar => $jar, protocols_allowed => ['http', 'https']);
$ua->agent(qq{Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0});
my $response = $ua->get($url);
if ($response->is_success) {
my $content = $response->decoded_content;
return $content;
} else {
my $content = $response->decoded_content;
printf "Get url error [%d] %s.\n", $response->code, $response->message;
}
}
该站点似乎安装了一些 Anti-Bot 保护。它似乎至少需要一个 User-Agent
和一个 Accept
header:
use LWP::UserAgent;
use HTTP::Request;
my $ua = LWP::UserAgent->new();
my $req = HTTP::Request->new(GET => 'https://dreaminislam.com/a/');
$req->header('User-Agent' => 'Mozilla/5.0');
$req->header('Accept' => '*/*');
my $response = $ua->request($req);
die $response->code if ! $response->is_success;
print $response->decoded_content;