在 Google 云端硬盘文档上使用 content-disposition 和 LWP::UserAgent

Using content-disposition with LWP::UserAgent on a Google Drive document

我正在尝试使用从服务器 (content-disposition) 收到的实际文件名来保存 Google 驱动器共享文件:

我试图分析 header:

use strict;
use warnings;
use Data::Dumper;
use LWP::UserAgent qw( );
my $str = 'https://drive.google.com/file/d/0B6vqTWO9kmdmdzk5ejhDSXgzMDg/view?usp=sharing';
$str =~ /file\/d\/(\w+)/;
my $url = 'https://drive.google.com/uc?export=download&id='.;
my $ua = LWP::UserAgent->new();
my $response = $ua->head($url)->{'_headers'};
print Dumper( $response );

我知道了:

$VAR1 = bless( {
'client-ssl-cert-subject' => '/C=US/ST=California/L=Mountain View/O=Google Inc/CN=*.googleusercontent.com',
'connection' => 'close',
'date' => 'Mon, 19 Oct 2015 14:45:45 GMT',
'content-type' => 'text/html; charset=UTF-8',
'x-guploader-uploadid' => 'AEnB2UrQJIoJUIIhWnKz9HAlW_2XKApLe_0IDMZjS0gGQOMdRaF68Od2xsxssp7mBdQP9kNrjvDueWUP5pSa1eHbprSjbPvfbA',
'alternate-protocol' => '443:quic,p=1',
'expires' => 'Mon, 19 Oct 2015 14:45:45 GMT',
'::std_case' => {
   'client-ssl-socket-class' => 'Client-SSL-Socket-Class',
   'client-ssl-cert-subject' => 'Client-SSL-Cert-Subject',
   'client-ssl-cipher' => 'Client-SSL-Cipher',
   'client-peer' => 'Client-Peer',
   'x-guploader-uploadid' => 'X-GUploader-UploadID',
   'alternate-protocol' => 'Alternate-Protocol',
   'alt-svc' => 'Alt-Svc',
   'client-ssl-cert-issuer' => 'Client-SSL-Cert-Issuer',
   'client-date' => 'Client-Date',
   'client-response-num' => 'Client-Response-Num'
 },
'client-ssl-cert-issuer' => '/C=US/O=Google Inc/CN=Google Internet Authority G2',
'server' => 'UploadServer',
'client-date' => 'Mon, 19 Oct 2015 14:45:55 GMT',
'client-ssl-socket-class' => 'IO::Socket::SSL',
'client-ssl-cipher' => 'ECDHE-ECDSA-AES128-GCM-SHA256',
'client-peer' => '2a00:1450:400c:c08::84:443',
'alt-svc' => 'quic=":443"; p="1"; ma=604800',
'cache-control' => 'private, max-age=0',
'client-response-num' => 1
}, 'HTTP::Headers' );

我希望在上面找到 content-disposition header。 另一方面,wget 正确给出了文件名:

#wget --content-disposition "https://drive.google.com/uc?export=download&id=0B6vqTWO9kmdmdzk5ejhDSXgzMDg"
--2015-10-19 20:21:37--  https://drive.google.com/uc?export=download&id=0B6vqTWO9kmdmdzk5ejhDSXgzMDg
Resolving drive.google.com (drive.google.com)... 2a00:1450:400c:c04::64, 74.125.206.139, 74.125.206.101, ...
Connecting to drive.google.com (drive.google.com)|2a00:1450:400c:c04::64|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://doc-0s-2s-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/unidk5uvfpl9kut1rl3hb5lqcvis8vdq/1445263200000/06380472059566149580/*/0B6vqTWO9kmdmdzk5ejhDSXgzMDg?e=download [following]
Warning: wildcards not supported in HTTP.
--2015-10-19 20:21:37--  https://doc-0s-2s-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/unidk5uvfpl9kut1rl3hb5lqcvis8vdq/1445263200000/06380472059566149580/*/0B6vqTWO9kmdmdzk5ejhDSXgzMDg?e=download
Resolving doc-0s-2s-docs.googleusercontent.com (doc-0s-2s-docs.googleusercontent.com)... 2a00:1450:400c:c08::84, 74.125.140.132
Connecting to doc-0s-2s-docs.googleusercontent.com (doc-0s-2s-docs.googleusercontent.com)|2a00:1450:400c:c08::84|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16 [text/plain]
Saving to: ‘testdoc.txttestdoc.txt’

testdoc.txttestdoc.txt                              100%[====================================================================================================================>]      16  --.-KB/s   in 0s

2015-10-19 20:21:38 (550 KB/s) - ‘testdoc.txttestdoc.txt’ saved [16/16]

如何使用 perl 从服务器获取正确的文件名?

造成差异的原因是:

HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://doc-0s-2s-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/unidk5uvfpl9kut1rl3hb5lqcvis8vdq/1445263200000/06380472059566149580/*/0B6vqTWO9kmdmdzk5ejhDSXgzMDg?e=download [following]

您的初始 URL 正在被重定向 - 所以您 headers 得到的不是正在下载的文件 wget。尝试阅读 status_line off your head request,看看你得到了什么。

区别在于您没有使用 wget 执行 HEAD。如果您查看 Perl 代码中的响应状态,您会得到

503 Service Unavailable

可以指很多东西,但在这种情况下意味着 HEAD 不受支持。将其更改为 GET,一切都很好

use strict;
use warnings;

use feature 'say';

use LWP::UserAgent;

my $str = 'https://drive.google.com/file/d/0B6vqTWO9kmdmdzk5ejhDSXgzMDg/view?usp=sharing';
die unless $str =~ m{file/d/(\w+)};

my $url = "https://drive.google.com/uc?export=download&id=";

my $ua = LWP::UserAgent->new;

my $resp = $ua->get($url);

say $resp->status_line;

say $resp->header('Content-Disposition');

输出

200 OK
attachment;filename="testdoc.txt";filename*=UTF-8''testdoc.txt