字符串在 Python 中的 while 循环中不情愿地连接

Strings unwillingly concatenate in while loop in Python

我正在编写一个简单的程序,它从文件中获取一个字符串并对其进行哈希处理。出于某种原因,循环不情愿地连接字符串。它在 for loopwhile 循环之外工作,但在其中做一些时髦的事情。这是我的代码。

import hashlib

f = open('1000-most-common-passwords.txt', 'r')  # Opens file with all of the strings to compare it to.

unparsed = f.read()
unparsed = unparsed.replace('\n', ' ').split(' ')  # Turns string into list with every new line.
sha1 = hashlib.sha1()
sha1.update(unparsed[0].encode('utf-8'))  # Line 1 is hashed into SHA-1.

这很好用。我可以替换 unparsed[0] 中的索引,它从该行中选择字符串并将其打印出来。现在,我想对文本文件中的每一行执行此操作,因此我编写了一个简单的 while 循环。这是它的样子。

i = 0  # Selects the first line.
while i < len(unparsed):  # While i is less than the amount of values in the list, keep going.
    sha1.update(unparsed[i].encode('utf-8'))  # Update the index to the current value in the list.
    print(sha1.hexdigest())
    i += 1

这没有给我任何错误。相反,它看起来就像我想要的样子。但它的实际作用让我很困扰。它没有给我每个值的散列值,而是给了我所有先前散列值的某种串联。它不是散列 123456,而是散列 123456123456123456password。为什么这在循环外有效但在循环内无效?非常感谢任何帮助。

您似乎想分别对每一行进行哈希处理; update 将继续散列 all 您提供的数据,因此您需要每行创建一个新的散列对象以获得您想要的内容。方法如下:

from hashlib import sha1

# Just read the file in binary mode so you don't have to re-encode it:
with open('1000-most-common-passwords.txt', 'rb') as f:
  for line in f.readlines():  # iterate over all the lines in the file
    pw = line.strip()  # Don't include the trailing newline in the hash
    digest = sha1(pw).hexdigest()
    print(f'{pw} hashes to {digest}')