如何使用 WWW::Mechanize 从网站下载图像文件?
How do I download an image file from a website using WWW::Mechanize?
我尝试从服务器下载图片。到目前为止,我已经尝试过了,
use warnings;
use strict;
use WWW::Mechanize;
my $sequence = "MIPTLAAEPRKPARPPLPVRRESREEPVDAVIVGTGAGGAPLLARLAQAGLKVVALEAGNHWDPAADFATDEREQNKLFWFDERLSAGADPLAFGRNNSGIGVGGSTLHYTAYVPRPQPDDFRLYSDFGVGEDWPIGYGDLEPYFDELECFLGVSGPSPYPWGPARTPYPLAPMPLNAAAQLMARGCAALGLRTSPAANAVLSAPYFQSGVGWRSPCTNRGFCQAGCTTGGKAGMDVTFIPLALAHGAEVRSGAFVTRIETDRAGRVTGVVYVREGREERQRCRTLFLAAGAIETPRLLLLNGLANQSGEVGRNFMAHPGLQLWGQFSEATRPFKGVPGSLISEDTHRPKDADFAGGYLLQSIGVMPVTYATQTARGGGLWGEKLQSHMHGYNHTAGINILGECLPYAHNYLELSDEPDQRGLPKPRIHFSNGKNERRLRDHAEALMRRIWEAAGAQAVWTFERNAHTIGTCRMGADPKRAVVDPEGRAFDVPNLYIIDNSVFPSALSVNPALTIMALSLRTADRFIERTQRGEY";
my $mech = WWW::Mechanize -> new;
$mech->get('https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html');
$mech->submit_form(
form_number => 1,
fields => {
'notice' => $sequence,
},
);
$mech->find_image( alt_regex => qr/.+sopma2.gif/ );
open (FH, ">soi.gif");
binmode (FH);
print FH $mech;
图片标签是这样的:
<img align="TOP" src="/tmp/e3a3c2b34201.sopma2.gif">
我已经从网站上解析了 link 图片,但我想下载这张图片。我该怎么做?
find_image
method of WWW::Mechanize returns a WWW::Mechanize::Image 对象。它只包含关于图像的 URI、文件名和 alt 标记信息,而不是图像本身的内容。您需要先下载 图像文件。
幸运的是,您可以使用 $mech
。 $image
有 a URI
method that returns the full URL of that image file. Your $mech
can get
那个图像。它以 HTTP::Response.
的形式出现
my $image = $mech->find_image( url_regex => qr/sopma2\.gif$/ );
my $res = $mech->get($image->URI);
if ($res->is_success) {
open (my $fh, '>', 'soi.gif') or die $!;
binmode $fh;
print $fh $res->decoded_content;
# no need to close lexical filehandle
}
等等,这是你的图像文件。
您可以使用 $mech->get(...)
将 URL 内容存储到本地文件中。
if( my $image = $mech->find_image( alt_regex => qr/.+sopma2.gif/ )) {
$mech->get( $img->url, ':content_file' => 'soi.gif');
}
How do i save an image with www::mechanize
man WWW::Mechanize
$mech->find_image()
Finds an image in the current page. It returns a WWW::Mechanize::Image object which describes the image. If it fails to find an image it returns undef.
...
$mech->get( $uri )
Given a URL/URI, fetches it. Returns an HTTP::Response object. $uri can be a well-formed URL string, a URI object, or a WWW::Mechanize::Link object. [...]
"get()" is a well-behaved overloaded version of the method in LWP::UserAgent. This lets you do things like
$mech->get( $uri, ':content_file' => $tempfile );
使用 LWP::Simple 和 WWW::Mechanize。
use WWW::Mechanize;
use LWP::Simple;
my $sequence = "MIPTLAA......";
my $mech = WWW::Mechanize -> new;
$mech->get('https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html');
$mech->submit_form(
form_number => 1,
fields => {
'notice' => $sequence,
},
);
my $cont = $mech->content;
($img) = $cont =~m/SRC=(.+sopma2\.gif)/g;
$urL = "https://npsa-prabi.ibcp.fr/$img";
getstore($urL,"soi.gif");
$img
存储图像
的url
然后使用LWP::Simple
中的getstore
方法保存图像
这不是个好主意。请参阅@simbabque 的回答。 但它会给出您需要的结果。
问题是您正在搜索其替代文本包含字符串 sopma2.gif
的图像。该图片没有替代文字,因此您的程序找不到它
此程序将获取您想要的 gif 文件。我正在使用 url_regex => qr/sopma2/i
在 URL 中查找 sopma2
。成功并且 returns 一个 WWW::Mechanize::Image
对象。然后所有必要的是获取绝对对象 URL 并使用 get
和 :content_file
参数将数据保存到磁盘文件
use strict;
use warnings;
use 5.010;
use WWW::Mechanize;
STDOUT->autoflush;
my $sequence = "MIPTLAAEPRKPARPPLPVRRESREEPVDAVIVGTGAGGAPLLARLAQAGLKVVALEAGNHWDPAADFATDEREQNKLFWFDERLSAGADPLAFGRNNSGIGVGGSTLHYTAYVPRPQPDDFRLYSDFGVGEDWPIGYGDLEPYFDELECFLGVSGPSPYPWGPARTPYPLAPMPLNAAAQLMARGCAALGLRTSPAANAVLSAPYFQSGVGWRSPCTNRGFCQAGCTTGGKAGMDVTFIPLALAHGAEVRSGAFVTRIETDRAGRVTGVVYVREGREERQRCRTLFLAAGAIETPRLLLLNGLANQSGEVGRNFMAHPGLQLWGQFSEATRPFKGVPGSLISEDTHRPKDADFAGGYLLQSIGVMPVTYATQTARGGGLWGEKLQSHMHGYNHTAGINILGECLPYAHNYLELSDEPDQRGLPKPRIHFSNGKNERRLRDHAEALMRRIWEAAGAQAVWTFERNAHTIGTCRMGADPKRAVVDPEGRAFDVPNLYIIDNSVFPSALSVNPALTIMALSLRTADRFIERTQRGEY";
my $mech = WWW::Mechanize->new;
$mech->get('https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html');
say $mech->res->status_line;
say $mech->title;
$mech->submit_form(
form_number => 1,
fields => {
notice => $sequence,
},
);
say $mech->res->status_line;
say $mech->title;
my $image = $mech->find_image( url_regex => qr/sopma2/i );
my ($file) = $image->url =~ m|([^/]+\z)|;
$mech->get($image->url_abs, ':content_file' => $file);
say "$file saved";
输出
200 OK
NPS@ : SOPMA secondary structure prediction
200 OK
NPS@ SOPMA secondary structure prediction results
373025433891.sopma2.gif saved
我尝试从服务器下载图片。到目前为止,我已经尝试过了,
use warnings;
use strict;
use WWW::Mechanize;
my $sequence = "MIPTLAAEPRKPARPPLPVRRESREEPVDAVIVGTGAGGAPLLARLAQAGLKVVALEAGNHWDPAADFATDEREQNKLFWFDERLSAGADPLAFGRNNSGIGVGGSTLHYTAYVPRPQPDDFRLYSDFGVGEDWPIGYGDLEPYFDELECFLGVSGPSPYPWGPARTPYPLAPMPLNAAAQLMARGCAALGLRTSPAANAVLSAPYFQSGVGWRSPCTNRGFCQAGCTTGGKAGMDVTFIPLALAHGAEVRSGAFVTRIETDRAGRVTGVVYVREGREERQRCRTLFLAAGAIETPRLLLLNGLANQSGEVGRNFMAHPGLQLWGQFSEATRPFKGVPGSLISEDTHRPKDADFAGGYLLQSIGVMPVTYATQTARGGGLWGEKLQSHMHGYNHTAGINILGECLPYAHNYLELSDEPDQRGLPKPRIHFSNGKNERRLRDHAEALMRRIWEAAGAQAVWTFERNAHTIGTCRMGADPKRAVVDPEGRAFDVPNLYIIDNSVFPSALSVNPALTIMALSLRTADRFIERTQRGEY";
my $mech = WWW::Mechanize -> new;
$mech->get('https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html');
$mech->submit_form(
form_number => 1,
fields => {
'notice' => $sequence,
},
);
$mech->find_image( alt_regex => qr/.+sopma2.gif/ );
open (FH, ">soi.gif");
binmode (FH);
print FH $mech;
图片标签是这样的:
<img align="TOP" src="/tmp/e3a3c2b34201.sopma2.gif">
我已经从网站上解析了 link 图片,但我想下载这张图片。我该怎么做?
find_image
method of WWW::Mechanize returns a WWW::Mechanize::Image 对象。它只包含关于图像的 URI、文件名和 alt 标记信息,而不是图像本身的内容。您需要先下载 图像文件。
幸运的是,您可以使用 $mech
。 $image
有 a URI
method that returns the full URL of that image file. Your $mech
can get
那个图像。它以 HTTP::Response.
my $image = $mech->find_image( url_regex => qr/sopma2\.gif$/ );
my $res = $mech->get($image->URI);
if ($res->is_success) {
open (my $fh, '>', 'soi.gif') or die $!;
binmode $fh;
print $fh $res->decoded_content;
# no need to close lexical filehandle
}
等等,这是你的图像文件。
您可以使用 $mech->get(...)
将 URL 内容存储到本地文件中。
if( my $image = $mech->find_image( alt_regex => qr/.+sopma2.gif/ )) {
$mech->get( $img->url, ':content_file' => 'soi.gif');
}
How do i save an image with www::mechanize
man WWW::Mechanize
$mech->find_image()
Finds an image in the current page. It returns a WWW::Mechanize::Image object which describes the image. If it fails to find an image it returns undef.
...
$mech->get( $uri )
Given a URL/URI, fetches it. Returns an HTTP::Response object. $uri can be a well-formed URL string, a URI object, or a WWW::Mechanize::Link object. [...]
"get()" is a well-behaved overloaded version of the method in LWP::UserAgent. This lets you do things like
$mech->get( $uri, ':content_file' => $tempfile );
使用 LWP::Simple 和 WWW::Mechanize。
use WWW::Mechanize;
use LWP::Simple;
my $sequence = "MIPTLAA......";
my $mech = WWW::Mechanize -> new;
$mech->get('https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html');
$mech->submit_form(
form_number => 1,
fields => {
'notice' => $sequence,
},
);
my $cont = $mech->content;
($img) = $cont =~m/SRC=(.+sopma2\.gif)/g;
$urL = "https://npsa-prabi.ibcp.fr/$img";
getstore($urL,"soi.gif");
$img
存储图像
然后使用LWP::Simple
getstore
方法保存图像
这不是个好主意。请参阅@simbabque 的回答。 但它会给出您需要的结果。
问题是您正在搜索其替代文本包含字符串 sopma2.gif
的图像。该图片没有替代文字,因此您的程序找不到它
此程序将获取您想要的 gif 文件。我正在使用 url_regex => qr/sopma2/i
在 URL 中查找 sopma2
。成功并且 returns 一个 WWW::Mechanize::Image
对象。然后所有必要的是获取绝对对象 URL 并使用 get
和 :content_file
参数将数据保存到磁盘文件
use strict;
use warnings;
use 5.010;
use WWW::Mechanize;
STDOUT->autoflush;
my $sequence = "MIPTLAAEPRKPARPPLPVRRESREEPVDAVIVGTGAGGAPLLARLAQAGLKVVALEAGNHWDPAADFATDEREQNKLFWFDERLSAGADPLAFGRNNSGIGVGGSTLHYTAYVPRPQPDDFRLYSDFGVGEDWPIGYGDLEPYFDELECFLGVSGPSPYPWGPARTPYPLAPMPLNAAAQLMARGCAALGLRTSPAANAVLSAPYFQSGVGWRSPCTNRGFCQAGCTTGGKAGMDVTFIPLALAHGAEVRSGAFVTRIETDRAGRVTGVVYVREGREERQRCRTLFLAAGAIETPRLLLLNGLANQSGEVGRNFMAHPGLQLWGQFSEATRPFKGVPGSLISEDTHRPKDADFAGGYLLQSIGVMPVTYATQTARGGGLWGEKLQSHMHGYNHTAGINILGECLPYAHNYLELSDEPDQRGLPKPRIHFSNGKNERRLRDHAEALMRRIWEAAGAQAVWTFERNAHTIGTCRMGADPKRAVVDPEGRAFDVPNLYIIDNSVFPSALSVNPALTIMALSLRTADRFIERTQRGEY";
my $mech = WWW::Mechanize->new;
$mech->get('https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html');
say $mech->res->status_line;
say $mech->title;
$mech->submit_form(
form_number => 1,
fields => {
notice => $sequence,
},
);
say $mech->res->status_line;
say $mech->title;
my $image = $mech->find_image( url_regex => qr/sopma2/i );
my ($file) = $image->url =~ m|([^/]+\z)|;
$mech->get($image->url_abs, ':content_file' => $file);
say "$file saved";
输出
200 OK
NPS@ : SOPMA secondary structure prediction
200 OK
NPS@ SOPMA secondary structure prediction results
373025433891.sopma2.gif saved