为什么我需要在 Python 中的散列之前声明编码，我该怎么做？

Question

我正在尝试创建一个 AI-like chatbot，其功能之一是登录。我以前使用过登录代码并且工作正常，但是我现在遇到了处理密码散列的代码的困难。这是代码：

import hashlib
...
register = input ("Are you a new user? (y/n) >")

password_file = 'passwords.txt'
if register.lower() == "y": 
    newusername = input ("What do you want your username to be? >")
    newpassword = input ("What do you want your password to be? >")

    newpassword = hashlib.sha224(newpassword).hexdigest()

    file = open(password_file, "a")
    file.write("%s,%s\n" % (newusername, newpassword))
    file.close()

elif register.lower() == ("n"):
    username = input ("What is your username? >")
    password = input ("What is your password? >")

    password = hashlib.sha224(password).hexdigest()

    print ("Loading...")
    with open(password_file) as f:
        for line in f:
            real_username, real_password = line.strip('\n').split(',')
            if username == real_username and password == real_password:
                success = True
                print ("Login successful!")
              #Put stuff here! KKC
    if not success:
        print("Incorrect login details.")

这是我得到的结果：

Traceback (most recent call last):
  File "<FOLDERSYSTEM>/main.py", line 36, in <module>
    newpassword = hashlib.sha224(newpassword).hexdigest()
TypeError: Unicode-objects must be encoded before hashing

我已经查找了我认为应该使用的编码 (latin-1) 并找到了所需的语法，将其添加进去，但我仍然收到相同的结果。

Answer 1

哈希处理字节。 str 对象包含 Unicode 文本，而不是字节，因此您必须先进行编码。选择一种编码 a) 可以处理您可能遇到的所有代码点，也许 b) 产生相同哈希值的其他系统也可以使用。

如果您是哈希值的唯一用户，那么只需选择 UTF-8；它可以处理所有 Unicode，并且对西方文本最有效：

newpassword = hashlib.sha224(newpassword.encode('utf8')).hexdigest()

来自 hash.hexdigest() 的 return 值是一个 Unicode str 值，因此您可以安全地将其与您从文件中读取的 str 值进行比较。

为什么我需要在 Python 中的散列之前声明编码，我该怎么做？

Why do I need to declare encoding before hashing in Python, and how can I do this?

python

macos

hash

character-encoding

password-storage