MD5 编码 HTML 给出 2 个不同的结果

Question

谁能帮忙解释一下为什么会这样？如果我使用请求模块从站点抓取 HTML 并使用 hashlib 获取 md5 校验和，我会得到一个答案。然后，如果我将 html 保存为 html 文件，打开它，并执行相同的 md5 校验和，它会给我一个不同的校验和。

import requests
import hashlib

resp = requests.post("http://casesearch.courts.state.md.us/", timeout=120)
html = resp.text
print("CheckSum 1: " + hashlib.md5(html.encode('utf-8')).hexdigest())

f = open("test.html", "w+")
f.write(html)
f.close()

with open('test.html', "r", encoding='utf-8') as f:
    html2 = f.read()
print("CheckSum 2: " + hashlib.md5(html2.encode('utf-8')).hexdigest())

结果如下：

CheckSum 1: e0b253903327c7f68a752c6922d8b47a
CheckSum 2: 3aaf94e0df9f1298d61830d99549ddb0

Answer 1

当以 文本模式 读取文件时，Python 可能会根据提供给 [=10] 的 newlines 参数值转换换行符=].

When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated.

这个差异会影响生成的哈希值。

MD5 编码 HTML 给出 2 个不同的结果

MD5 Encoding HTML Giving 2 Different Results

python

checksum

md5

hashlib

python-3.x