在实际下载文件之前，如何可靠地获取通过 http(s) 下载的文件的摘要/哈希/指纹？

Question

采取以下url：

https://i.imgur.com/oEdf6Rl.png

当我请求它时：

Connection
keep-alive

Content-Length
44374

Last-Modified
Sun, 21 Feb 2021 15:14:36 GMT

ETag
"83c16cca4ee371145485130383104315"

Content-Type
image/png

cache-control
public, max-age=31536000

Accept-Ranges
bytes

Date
Thu, 25 Feb 2021 18:33:52 GMT

Age
357546

X-Served-By
cache-bwi5147-BWI, cache-sea4455-SEA

X-Cache
HIT, HIT

X-Cache-Hits
1, 1

X-Timer
S1614278033.761056,VS0,VE1

Strict-Transport-Security
max-age=300

Access-Control-Allow-Methods
GET, OPTIONS

Access-Control-Allow-Origin
*

Server
cat factory 1.0

X-Content-Type-Options
nosniff

NoError
Unknown error

我在那里没有看到任何哈希。我读过您可以请求带有 header 的可选哈希，例如 Content-MD5（https://www.rfc-editor.org/rfc/rfc1864 ) and Want-Digest ( https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Want-Digest << 我还没有找到支持此 header 的单个文件）

对我来说没有意义的是，当您下载该图像时，它不提供像 md5 那样的散列，它如何验证我收到的实际上是发送的？
如果它是用散列验证的，为什么会有所有这些随机可选的 headers 在那里你可以请求散列？
在实际下载文件之前，我需要知道是否有可靠的方法来获取文件的 fingerprint/hash/digest。如果不;有没有“最”靠谱的方法？

Answer 1

• What does not make sense to me is that when you download that image, and it does not provide a hash like an md5, how is it verifying that what I received was in fact what was sent?

没有。虽然它通过 HTTPS 的事实提供了消息未被更改的某些保证。

• If it is verifying with a hash, why are there all these random optional headers where you can request hashes?

这些是额外的。尽管应该注意 RFC 1884 适用于 MIME 消息（即电子邮件附件）而不是 HTTP 请求。对于 HTTP 的 Content-MD5，这已被废弃，现在使用 Want-Digest/Digest 方法。

但是，如果 HTTPS 可以保证消息未被篡改，为什么还需要它呢？ draft spec covers this:

“However, there are cases where relying on this alone is insufficient. An HTTP-level integrity mechanism that operates independent of transfer can be used to detect programming errors and/or corruption of data at rest, be used across multiple hops in order to provide end-to-end integrity guarantees, aid fault diagnosis across hops and system boundaries, and can be used to validate integrity when reconstructing a resource fetched using different HTTP connections.”

• I need to know if there is a reliable way to get file's fingerprint/hash/digest before actually downloading it. If not; is there a "most" reliable method?

The Digest header 恕我直言，很少受支持，因此最可靠的方法是单独下载哈希。许多软件下载页面 (for example Apache) 将这些作为二进制文件的单独下载链接提供。尽管这些是为了确保最终下载的完整性而不是 HTTP 部分。

老实说，HTTPS 解决了传输层安全性中对摘要的大部分需求，这就是为什么你看到摘要被大量使用的原因，除非构建一个专门的应用程序想要单独发送它。

Answer 2

What does not make sense to me is that when you download that image, and it does not provide a hash like an md5, how is it verifying that what I received was in fact what was sent?

大多数 HTTP 服务器不支持提供摘要。虽然有些人这样做，但这种情况非常罕见。如果您使用的是 HTTPS，则 TLS 提供完整性保证，即数据完好无损且未因 HMAC 值（键控哈希）或 AEAD 加密算法（本质上是加密和键控哈希）而被篡改一）。假设连接已正确终止，并且双方都正确使用协议并正确验证完整性，则可以确保连接另一端发送的数据完好无损。

If it is verifying with a hash, why are there all these random optional headers where you can request hashes?

这些 header 值在预先计算和修复摘要时最有用，并且摘要用于检测创建它的系统和服务它的服务器之间的意外问题。例如，它可用于检测 CDN 是否存在导致其意外截断数据的编程错误。

它们也可以在 HEAD 请求中使用，以在不发送整个 object 的情况下找出 object 的摘要。例如，如果您正在使用 HTTPS 并希望收集摘要供将来使用而不下载数据，那么这些 header 可能会有用。请注意，由于 MD5 完全不安全并且不被责任方使用，因此 Content-MD5 header 完全无用并且不会提供任何有用的信息，尽管 Digest header 使用 SHA-256 或 SHA-512 值之一当然可以。

I need to know if there is a reliable way to get file's fingerprint/hash/digest before actually downloading it. If not; is there a "most" reliable method?

由于大多数网站不提供 Digest header，您不能在大多数地方使用它。如果您的目标是做一些事情，比如验证软件的完整性，各种发行商会提供哈希、签名或两者，尽管这取决于项目。如果您只想确保收到的数据未被篡改，请使用 HTTPS。如果你只想知道位于任意 URL 的内容的哈希值，那么除非服务器支持 Digest header，否则你只需要下载它就可以找到。

在实际下载文件之前，如何可靠地获取通过 http(s) 下载的文件的摘要/哈希/指纹？

How can you reliably get the Digest / Hash / Fingerprint of a file you are downloading via http(s) before you actually download it?

hash

https

md5

http-headers