如何使用 WWW::Mechanize 从网站下载图像文件?

How do I download an image file from a website using WWW::Mechanize?

我尝试从服务器下载图片。到目前为止,我已经尝试过了,

use warnings;
use strict; 
use WWW::Mechanize;

my $sequence = "MIPTLAAEPRKPARPPLPVRRESREEPVDAVIVGTGAGGAPLLARLAQAGLKVVALEAGNHWDPAADFATDEREQNKLFWFDERLSAGADPLAFGRNNSGIGVGGSTLHYTAYVPRPQPDDFRLYSDFGVGEDWPIGYGDLEPYFDELECFLGVSGPSPYPWGPARTPYPLAPMPLNAAAQLMARGCAALGLRTSPAANAVLSAPYFQSGVGWRSPCTNRGFCQAGCTTGGKAGMDVTFIPLALAHGAEVRSGAFVTRIETDRAGRVTGVVYVREGREERQRCRTLFLAAGAIETPRLLLLNGLANQSGEVGRNFMAHPGLQLWGQFSEATRPFKGVPGSLISEDTHRPKDADFAGGYLLQSIGVMPVTYATQTARGGGLWGEKLQSHMHGYNHTAGINILGECLPYAHNYLELSDEPDQRGLPKPRIHFSNGKNERRLRDHAEALMRRIWEAAGAQAVWTFERNAHTIGTCRMGADPKRAVVDPEGRAFDVPNLYIIDNSVFPSALSVNPALTIMALSLRTADRFIERTQRGEY";

my $mech = WWW::Mechanize -> new;
$mech->get('https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html');
    $mech->submit_form(
        form_number => 1,
        fields => {
        'notice' => $sequence,
        },
    );


$mech->find_image( alt_regex => qr/.+sopma2.gif/ );
open (FH, ">soi.gif");
binmode (FH);
print FH $mech;

图片标签是这样的:

<img align="TOP" src="/tmp/e3a3c2b34201.sopma2.gif">

我已经从网站上解析了 link 图片,但我想下载这张图片。我该怎么做?

find_image method of WWW::Mechanize returns a WWW::Mechanize::Image 对象。它只包含关于图像的 URI、文件名和 alt 标记信息,而不是图像本身的内容。您需要先下载 图像文件。

幸运的是,您可以使用 $mech$imagea URI method that returns the full URL of that image file. Your $mech can get 那个图像。它以 HTTP::Response.

的形式出现
my $image = $mech->find_image( url_regex => qr/sopma2\.gif$/ );
my $res = $mech->get($image->URI);

if ($res->is_success) {
  open (my $fh, '>', 'soi.gif') or die $!;
  binmode $fh;
  print $fh $res->decoded_content;
  # no need to close lexical filehandle
}

等等,这是你的图像文件。

您可以使用 $mech->get(...) 将 URL 内容存储到本地文件中。

if( my $image = $mech->find_image( alt_regex => qr/.+sopma2.gif/ )) {
  $mech->get( $img->url, ':content_file' => 'soi.gif');
}

How do i save an image with www::mechanize

man WWW::Mechanize

$mech->find_image()
Finds an image in the current page. It returns a WWW::Mechanize::Image object which describes the image. If it fails to find an image it returns undef.
...
$mech->get( $uri )
Given a URL/URI, fetches it. Returns an HTTP::Response object. $uri can be a well-formed URL string, a URI object, or a WWW::Mechanize::Link object. [...]
"get()" is a well-behaved overloaded version of the method in LWP::UserAgent. This lets you do things like
$mech->get( $uri, ':content_file' => $tempfile );

使用 LWP::Simple 和 WWW::Mechanize。

use WWW::Mechanize;
use LWP::Simple;
my $sequence = "MIPTLAA......";

my $mech = WWW::Mechanize -> new;
$mech->get('https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html');
    $mech->submit_form(
        form_number => 1,
        fields => {
        'notice' => $sequence,
        },
    );

my $cont = $mech->content;  
($img) = $cont =~m/SRC=(.+sopma2\.gif)/g; 
$urL = "https://npsa-prabi.ibcp.fr/$img";
getstore($urL,"soi.gif");

$img存储图像

的url

然后使用LWP::Simple

中的getstore方法保存图像

这不是个好主意。请参阅@simbabque 的回答。 但它会给出您需要的结果。

问题是您正在搜索其替代文本包含字符串 sopma2.gif 的图像。该图片没有替代文字,因此您的程序找不到它

此程序将获取您想要的 gif 文件。我正在使用 url_regex => qr/sopma2/i 在 URL 中查找 sopma2。成功并且 returns 一个 WWW::Mechanize::Image 对象。然后所有必要的是获取绝对对象 URL 并使用 get:content_file 参数将数据保存到磁盘文件

use strict;
use warnings;
use 5.010;

use WWW::Mechanize;

STDOUT->autoflush;

my $sequence = "MIPTLAAEPRKPARPPLPVRRESREEPVDAVIVGTGAGGAPLLARLAQAGLKVVALEAGNHWDPAADFATDEREQNKLFWFDERLSAGADPLAFGRNNSGIGVGGSTLHYTAYVPRPQPDDFRLYSDFGVGEDWPIGYGDLEPYFDELECFLGVSGPSPYPWGPARTPYPLAPMPLNAAAQLMARGCAALGLRTSPAANAVLSAPYFQSGVGWRSPCTNRGFCQAGCTTGGKAGMDVTFIPLALAHGAEVRSGAFVTRIETDRAGRVTGVVYVREGREERQRCRTLFLAAGAIETPRLLLLNGLANQSGEVGRNFMAHPGLQLWGQFSEATRPFKGVPGSLISEDTHRPKDADFAGGYLLQSIGVMPVTYATQTARGGGLWGEKLQSHMHGYNHTAGINILGECLPYAHNYLELSDEPDQRGLPKPRIHFSNGKNERRLRDHAEALMRRIWEAAGAQAVWTFERNAHTIGTCRMGADPKRAVVDPEGRAFDVPNLYIIDNSVFPSALSVNPALTIMALSLRTADRFIERTQRGEY";

my $mech = WWW::Mechanize->new;
$mech->get('https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html');

say $mech->res->status_line;
say $mech->title;

$mech->submit_form(
    form_number => 1,
    fields => {
      notice => $sequence,
    },
);

say $mech->res->status_line;
say $mech->title;

my $image = $mech->find_image( url_regex => qr/sopma2/i );
my ($file) = $image->url =~ m|([^/]+\z)|;
$mech->get($image->url_abs, ':content_file' => $file);
say "$file saved";

输出

200 OK
NPS@ : SOPMA secondary structure prediction
200 OK
NPS@ SOPMA secondary structure prediction results
373025433891.sopma2.gif saved