WGET - HTTPS 与 HTTP = HTTPS 更慢

WGET - HTTPS vs HTTP = HTTPS Slower

我正在尝试使用 wget 进行一些测试,我注意到对于同一服务器,HTTPS 页面在 wget 中加载比在 http 中加载更重要。 这似乎与任何网络差异无关。在名称解析之前 wget 需要大约 5 秒的额外时间。 谁能帮忙?我怎样才能克服这个?当我注意到这一点时,我正在寻找使用带有 -p 和 -H 选项的 wget 来评估网络性能。

xbian@xbian ~ $ wget -V
GNU Wget 1.13.4 built on linux-gnueabihf.

+digest +https +ipv6 +iri +large-file +nls -ntlm +opie +ssl/gnutls

Wgetrc:
    /etc/wgetrc (system)
Locale: /usr/share/locale
Compile: gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc"
    -DLOCALEDIR="/usr/share/locale" -I. -I../lib -I../lib
    -D_FORTIFY_SOURCE=2 -Iyes/include -g -O2 -fstack-protector
    --param=ssp-buffer-size=4 -Wformat -Werror=format-security
    -DNO_SSLv2 -D_FILE_OFFSET_BITS=64 -g -Wall
Link: gcc -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat
    -Werror=format-security -DNO_SSLv2 -D_FILE_OFFSET_BITS=64 -g -Wall
    -Wl,-z,relro -Lyes/lib -lgnutls -lgcrypt -lgpg-error -lz -lidn -lrt
    ftp-opie.o gnutls.o ../lib/libgnu.a

Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://www.gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Originally written by Hrvoje Niksic <hniksic@xemacs.org>.
Please send bug reports and questions to <bug-wget@gnu.org>.
xbian@xbian ~ $ time wget -d -v --no-check-certificate --delete-after -4 http://www.google.pt 2>&1  | awk '{ print strftime("%Y-%m-%d %H:%M:%S"), [=10=]; fflush(); }'
2015-02-07 01:10:57 Setting --verbose (verbose) to 1
2015-02-07 01:10:57 Setting --check-certificate (checkcertificate) to 0
2015-02-07 01:10:57 Setting --delete-after (deleteafter) to 1
2015-02-07 01:10:57 Setting --inet4-only (inet4only) to 1
2015-02-07 01:10:57 DEBUG output created by Wget 1.13.4 on linux-gnueabihf.
2015-02-07 01:10:57
2015-02-07 01:10:57 URI encoding = `UTF-8'
2015-02-07 01:10:57 --2015-02-07 01:10:57--  http://www.google.pt/
2015-02-07 01:10:57 Resolving www.google.pt (www.google.pt)... 213.30.5.52, 213.30.5.24, 213.30.5.18, ...
2015-02-07 01:10:57 Caching www.google.pt => 213.30.5.52 213.30.5.24 213.30.5.18 213.30.5.25 213.30.5.59 213.30.5.31 213.30.5.45 213.30.5.46 213.30.5.39 213.30.5.53 213.30.5.32 213.30.5.38
2015-02-07 01:10:57 Connecting to www.google.pt (www.google.pt)|213.30.5.52|:80... connected.
2015-02-07 01:10:57 Created socket 3.
2015-02-07 01:10:57 Releasing 0x003b8040 (new refcount 1).
2015-02-07 01:10:57
2015-02-07 01:10:57 ---request begin---
2015-02-07 01:10:57 GET / HTTP/1.1
2015-02-07 01:10:57 User-Agent: Wget/1.13.4 (linux-gnueabihf)
2015-02-07 01:10:57 Accept: */*
2015-02-07 01:10:57 Host: www.google.pt
2015-02-07 01:10:57 Connection: Keep-Alive
2015-02-07 01:10:57
2015-02-07 01:10:57 ---request end---
2015-02-07 01:10:58 HTTP request sent, awaiting response...
2015-02-07 01:10:58 ---response begin---
2015-02-07 01:10:58 HTTP/1.1 200 OK
2015-02-07 01:10:58 Date: Sat, 07 Feb 2015 01:10:58 GMT
2015-02-07 01:10:58 Expires: -1
2015-02-07 01:10:58 Cache-Control: private, max-age=0
2015-02-07 01:10:58 Content-Type: text/html; charset=ISO-8859-1
2015-02-07 01:10:58 Set-Cookie: PREF=ID=98608883e4031983:FF=0:TM=1423271458:LM=1423271458:S=BnwaLDxFbjCUyPnF; expires=Mon, 06-Feb-2017 01:10:58 GMT; path=/; domain=.google.pt
2015-02-07 01:10:58 Set-Cookie: NID=67=AkXpY2nJPDDcH7xKJkslxdCtflnhOZJiNwZdu4YBAIc2FnjIZIAYHzFuln5boxiOHq1WWBdbcTnLXwPqOrfxOxkLXtO2U5UAVBCU0nVcgyC61_YLZLXGR0Fmdi9M_fIp; expires=Sun, 09-Aug-2015 01:10:58 GMT; path=/; domain=.google.pt; HttpOnly
2015-02-07 01:10:58 P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
2015-02-07 01:10:58 Server: gws
2015-02-07 01:10:58 X-XSS-Protection: 1; mode=block
2015-02-07 01:10:58 X-Frame-Options: SAMEORIGIN
2015-02-07 01:10:58 Alternate-Protocol: 80:quic,p=0.02
2015-02-07 01:10:58 Accept-Ranges: none
2015-02-07 01:10:58 Vary: Accept-Encoding
2015-02-07 01:10:58 Transfer-Encoding: chunked
2015-02-07 01:10:58
2015-02-07 01:10:58 ---response end---
2015-02-07 01:10:58 200 OK
2015-02-07 01:10:58 cdm: 1 2 3 4 5 6 7 8
2015-02-07 01:10:58 Stored cookie google.pt -1 (ANY) / <permanent> <insecure> [expiry 2017-02-06 01:10:58] PREF ID=98608883e4031983:FF=0:TM=1423271458:LM=1423271458:S=BnwaLDxFbjCUyPnF
2015-02-07 01:10:58 cdm: 1 2 3 4 5 6 7 8
2015-02-07 01:10:58 Stored cookie google.pt -1 (ANY) / <permanent> <insecure> [expiry 2015-08-09 02:10:58] NID 67=AkXpY2nJPDDcH7xKJkslxdCtflnhOZJiNwZdu4YBAIc2FnjIZIAYHzFuln5boxiOHq1WWBdbcTnLXwPqOrfxOxkLXtO2U5UAVBCU0nVcgyC61_YLZLXGR0Fmdi9M_fIp
2015-02-07 01:10:58 Registered socket 3 for persistent reuse.
2015-02-07 01:10:58 URI content encoding = `ISO-8859-1'
2015-02-07 01:10:58 Length: unspecified [text/html]
2015-02-07 01:10:58 Saving to: `index.html'
2015-02-07 01:10:58
2015-02-07 01:10:58      0K .......... .......                                     17.6M=0.001s
2015-02-07 01:10:58
2015-02-07 01:10:58 2015-02-07 01:10:58 (17.6 MB/s) - `index.html' saved [18301]
2015-02-07 01:10:58
2015-02-07 01:10:58 Removing file due to --delete-after in main():
2015-02-07 01:10:58 Removing index.html.

real    0m0.350s
user    0m0.038s
sys     0m0.027s
xbian@xbian ~ $ time wget -d -v --no-check-certificate --delete-after -4 https://www.google.pt 2>&1  | awk '{ print strftime("%Y-%m-%d %H:%M:%S"), [=10=]; fflush(); }'
2015-02-07 01:11:01 Setting --verbose (verbose) to 1
2015-02-07 01:11:01 Setting --check-certificate (checkcertificate) to 0
2015-02-07 01:11:01 Setting --delete-after (deleteafter) to 1
2015-02-07 01:11:01 Setting --inet4-only (inet4only) to 1
2015-02-07 01:11:01 DEBUG output created by Wget 1.13.4 on linux-gnueabihf.
2015-02-07 01:11:01
2015-02-07 01:11:01 URI encoding = `UTF-8'
2015-02-07 01:11:01 --2015-02-07 01:11:01--  https://www.google.pt/
2015-02-07 01:11:06 Resolving www.google.pt (www.google.pt)... 213.30.5.25, 213.30.5.53, 213.30.5.38, ...
2015-02-07 01:11:06 Caching www.google.pt => 213.30.5.25 213.30.5.53 213.30.5.38 213.30.5.32 213.30.5.24 213.30.5.46 213.30.5.39 213.30.5.18 213.30.5.52 213.30.5.31 213.30.5.59 213.30.5.45
2015-02-07 01:11:06 Connecting to www.google.pt (www.google.pt)|213.30.5.25|:443... connected.
2015-02-07 01:11:06 Created socket 4.
2015-02-07 01:11:06 Releasing 0x00b53d48 (new refcount 1).
2015-02-07 01:11:06
2015-02-07 01:11:06 ---request begin---
2015-02-07 01:11:06 GET / HTTP/1.1
2015-02-07 01:11:06 User-Agent: Wget/1.13.4 (linux-gnueabihf)
2015-02-07 01:11:06 Accept: */*
2015-02-07 01:11:06 Host: www.google.pt
2015-02-07 01:11:06 Connection: Keep-Alive
2015-02-07 01:11:06
2015-02-07 01:11:06 ---request end---
2015-02-07 01:11:06 HTTP request sent, awaiting response...
2015-02-07 01:11:06 ---response begin---
2015-02-07 01:11:06 HTTP/1.1 200 OK
2015-02-07 01:11:06 Date: Sat, 07 Feb 2015 01:11:06 GMT
2015-02-07 01:11:06 Expires: -1
2015-02-07 01:11:06 Cache-Control: private, max-age=0
2015-02-07 01:11:06 Content-Type: text/html; charset=ISO-8859-1
2015-02-07 01:11:06 Set-Cookie: PREF=ID=579b1dd2360c9122:FF=0:TM=1423271466:LM=1423271466:S=9zOSotidcZWjJfXX; expires=Mon, 06-Feb-2017 01:11:06 GMT; path=/; domain=.google.pt
2015-02-07 01:11:06 Set-Cookie: NID=67=Jetj6llJijt09db9ekqGS6cBo3DE0CDqfQkp9Sh8xtLyYnNGU5zHoMED0whNkToP_w6mk6-oLTSRVdYIDekUEZH02oBYQPQhHmhpQzENI08zGNg9Jxn4EkXTIVApLCAG; expires=Sun, 09-Aug-2015 01:11:06 GMT; path=/; domain=.google.pt; HttpOnly
2015-02-07 01:11:06 P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
2015-02-07 01:11:06 Server: gws
2015-02-07 01:11:06 X-XSS-Protection: 1; mode=block
2015-02-07 01:11:06 X-Frame-Options: SAMEORIGIN
2015-02-07 01:11:06 Accept-Ranges: none
2015-02-07 01:11:06 Vary: Accept-Encoding
2015-02-07 01:11:06 Transfer-Encoding: chunked
2015-02-07 01:11:06
2015-02-07 01:11:06 ---response end---
2015-02-07 01:11:06 200 OK
2015-02-07 01:11:06 cdm: 1 2 3 4 5 6 7 8
2015-02-07 01:11:06 Stored cookie google.pt -1 (ANY) / <permanent> <insecure> [expiry 2017-02-06 01:11:06] PREF ID=579b1dd2360c9122:FF=0:TM=1423271466:LM=1423271466:S=9zOSotidcZWjJfXX
2015-02-07 01:11:06 cdm: 1 2 3 4 5 6 7 8
2015-02-07 01:11:06 Stored cookie google.pt -1 (ANY) / <permanent> <insecure> [expiry 2015-08-09 02:11:06] NID 67=Jetj6llJijt09db9ekqGS6cBo3DE0CDqfQkp9Sh8xtLyYnNGU5zHoMED0whNkToP_w6mk6-oLTSRVdYIDekUEZH02oBYQPQhHmhpQzENI08zGNg9Jxn4EkXTIVApLCAG
2015-02-07 01:11:06 Registered socket 4 for persistent reuse.
2015-02-07 01:11:06 URI content encoding = `ISO-8859-1'
2015-02-07 01:11:06 Length: unspecified [text/html]
2015-02-07 01:11:06 Saving to: `index.html'
2015-02-07 01:11:06
2015-02-07 01:11:06      0K .......... .......                                      670K=0.03s
2015-02-07 01:11:06
2015-02-07 01:11:06 2015-02-07 01:11:06 (670 KB/s) - `index.html' saved [18319]
2015-02-07 01:11:06
2015-02-07 01:11:06 Removing file due to --delete-after in main():
2015-02-07 01:11:06 Removing index.html.

real    0m5.371s
user    0m4.083s
sys     0m0.280s

在curl中,差异似乎不大...

xbian@xbian ~ $ curl -V
curl 7.26.0 (arm-unknown-linux-gnueabihf) libcurl/7.26.0 OpenSSL/1.0.1e zlib/1.2.7 libidn/1.25 libssh2/1.4.2 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap pop3 pop3s rtmp rtsp scp sftp smtp smtps telnet tftp
Features: Debug GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP
xbian@xbian ~ $ time curl -s http:///www.google.pt > /dev/null

real    0m0.140s
user    0m0.056s
sys     0m0.034s
xbian@xbian ~ $ time curl -s https:///www.google.pt > /dev/null

real    0m0.294s
user    0m0.060s
sys     0m0.031s

设置 SSL/TSL 会产生一些开销,因为必须建立会话密钥,但这往往可以忽略不计,所以我怀疑这是否是真正的原因,但谁也不知道。

How can I overcome this?

你不能。

HTTP 和 HTTPS 的区别在于后者使用 SSL/TLS 来保护连接。 SSL/TLS 有很大的开销:

  • 在启动时,客户端和服务器交换证书等,以便(至少)客户端可以验证服务器不是冒名顶替者。

    启动协商需要大量客户端 <-> 服务器消息交换。如果 TCP/IP 级连接有明显的延迟,这将表现为明显的延迟。

  • 建立连接后,通过连接传输的数据在发送时加密,在接收时解密。


如果您想安全地与常规的当前一代 Web 服务器通信,我认为没有任何实用的 HTTPS 替代方案。我不认为它随着 "next generation" HTTP 而改变;即 HTTP/2。

唯一可以加快速度的方法(HTTP/1.1 或 HTTP/2)是为多个 GET 重用 "persistent connection"。 SSL/TLS 协商仅在连接建立时发生。但是,持久连接在 "single shot" 用例中没有帮助;例如当您使用 wgetcurl 获取一个文件时。

How can I overcome this?

this,不是 GNU Wget 的问题。我尝试了 运行 你的命令:

$ time wget -d -v --no-check-certificate --delete-after -4 http://www.google.pt 2>&1  | awk '{ print strftime("%Y-%m-%d %H:%M:%S"), [=10=]; fflush(); }'
$ time wget -d -v --no-check-certificate --delete-after -4 https://www.google.pt 2>&1  | awk '{ print strftime("%Y-%m-%d %H:%M:%S"), [=10=]; fflush(); }'

和运行他们各约10次。最后的结果?由于 SSL/TSL 协商协议,时间差异很小,正如预期的那样。这完全符合我对 GNU Wget 行为的期望。那你为什么看到这么大的差异呢?

让我们看看 https 版本的输出:

2015-02-07 01:11:01 Setting --verbose (verbose) to 1
2015-02-07 01:11:01 Setting --check-certificate (checkcertificate) to 0
2015-02-07 01:11:01 Setting --delete-after (deleteafter) to 1
2015-02-07 01:11:01 Setting --inet4-only (inet4only) to 1
2015-02-07 01:11:01 DEBUG output created by Wget 1.13.4 on linux-gnueabihf.
2015-02-07 01:11:01
2015-02-07 01:11:01 URI encoding = `UTF-8'
2015-02-07 01:11:01 --2015-02-07 01:11:01--  https://www.google.pt/
2015-02-07 01:11:06 Resolving www.google.pt (www.google.pt)... 213.30.5.25, 213.30.5.53, 213.30.5.38, ...
2015-02-07 01:11:06 Caching www.google.pt => 213.30.5.25 213.30.5.53 213.30.5.38 213.30.5.32 213.30.5.24 213.30.5.46 213.30.5.39 213.30.5.18 213.30.5.52 213.30.5.31 213.30.5.59 213.30.5.45
2015-02-07 01:11:06 Connecting to www.google.pt (www.google.pt)|213.30.5.25|:443... connected.
2015-02-07 01:11:06 Created socket 4.
2015-02-07 01:11:06 Releasing 0x00b53d48 (new refcount 1).
2015-02-07 01:11:06
2015-02-07 01:11:06 ---request begin---
2015-02-07 01:11:06 GET / HTTP/1.1
2015-02-07 01:11:06 User-Agent: Wget/1.13.4 (linux-gnueabihf)
2015-02-07 01:11:06 Accept: */*
2015-02-07 01:11:06 Host: www.google.pt
2015-02-07 01:11:06 Connection: Keep-Alive
2015-02-07 01:11:06
2015-02-07 01:11:06 ---request end---

我只考虑了 Wget 发送第一个请求时的输出。此时,人人都说导致时间急剧上升的SSL/TSL谈判,还没有开始。然而,如果你仔细观察,所花费的时间 > 5s!

因此,您注意到的这种行为绝对不是由使用 HTTPS 的开销引起的。那么,那是什么呢?再次仔细查看输出。在哪几行之间经过了最长时间?

2015-02-07 01:11:01 --2015-02-07 01:11:01--  https://www.google.pt/
2015-02-07 01:11:06 Resolving www.google.pt (www.google.pt)... 213.30.5.25, 213.30.5.53, 213.30.5.38, ...

这意味着Wget用了~5秒来从域名解析IP地址。但是,DNS 解析根本不是 Wget 处理的事情。 Wget 请求系统获取 IP 地址。这可以在文件 host.c:329:

中看到
static void
gethostbyname_with_timeout_callback (void *arg)
{
  struct ghbnwt_context *ctx = (struct ghbnwt_context *)arg;
  ctx->hptr = gethostbyname (ctx->host_name);
}

因此,在您的案例中真正发生的是您的系统需要一些额外的时间来解析主机名。发生这种情况的原因多种多样。但是,您并没有 运行 多次测试,而是陷入仓促泛化,并简单地假设 Wget 执行 HTTPS 非常慢。

经过一些测试和与 wget 开发人员的讨论,我得出的结论是这是 gnutls 库的问题。如果 wget 是用 openssl 编译的,则行为更像 curl。

您的计算机可能正在尝试 IPv6 DNS 查找但由于配置不正确而失败。它在超时后回退到 IPv4,然后连接成功。如果这是问题所在,您需要修复 IPv6 配置或完全禁用 IPv6。

要验证这一理论,请使用“ping6”来尝试 ping 您尝试连接的主机。我的猜测是“ping6”会失败,而“ping”会成功。

如何测试:

greg@mycomputer:~$ ping6 www.google.pt
PING www.google.pt(ord30s26-in-x03.1e100.net) 56 data bytes
64 bytes from ord30s26-in-x03.1e100.net: icmp_seq=1 ttl=53 time=19.5 ms
64 bytes from ord30s26-in-x03.1e100.net: icmp_seq=2 ttl=53 time=18.3 ms
^C
--- www.google.pt ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 18.342/18.970/19.599/0.643 ms
greg@mycomputer:~$ ping www.google.pt
PING www.google.pt (216.58.192.227) 56(84) bytes of data.
64 bytes from ord30s26-in-f3.1e100.net (216.58.192.227): icmp_seq=1 ttl=54 time=19.0 ms
64 bytes from ord30s26-in-f3.1e100.net (216.58.192.227): icmp_seq=2 ttl=54 time=18.3 ms