为什么此生产代码有效：`base64.b64decode(api_token.encode(“utf-8)).decode(“utf-8”)`？

Question

今天上班，看到下面一行代码：

decoded_token = base64.b64decode(api_token.encode("utf-8")).decode("utf-8")

它是 AirFlow ETL 脚本的一部分，decoded_token 在 API 请求中用作 Bearer Token。这段代码在使用 Python 2.7 的服务器上执行，我的同事告诉我这段代码每天都成功运行。

然而，据我了解，代码首先尝试将 api_token 转换为字节 (.encode)，然后将字节转换为字符串 (base64.b64decode)，最后将字符串再次转换为字符串（。解码）。我认为这总是会导致错误。

import base64
api_token = "random-string"
decoded_token = base64.b64decode(api_token.encode("utf-8")).decode("utf-8")

运行本地代码给我：

错误：UnicodeDecodeError: 'utf8' codec can't decode byte 0xad in position 0: invalid start byte

input/type api_token 需要什么才能让这一行 而不是 抛出错误？这是可能的还是必须有其他因素在起作用？

编辑： 正如 Klaus D. 所提到的，显然，在 Python 2 中 encode 和 decode 都消耗并返回了一个字符串.然而，运行上面 Python 2.7 中的代码给了我同样的错误，我还没有找到不抛出错误的 api_token 的输入。

Answer 1

问题可能只是您的测试输入字符串不是 base64 编码的字符串，而在生产中，无论输入已经是什么！

Python 2.7.18 (default, Jan  4 2022, 17:47:56)
...
>>> import base64
>>> api_token = "random-string"
>>> base64.b64decode(api_token)
'\xad\xa9\xdd\xa2k-\xae)\xe0'
>>> base64.b64decode(api_token).decode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xad in position 0: invalid start byte

将字符串编码为 base64，之后您也不需要将其解码为“utf-8”，但如果您希望使用 unicode 字符，则可以

>>> api_token = base64.b64encode(api_token)
>>> api_token
'cmFuZG9tLXN0cmluZw=='
>>> base64.b64decode(api_token)
'random-string'
>>> base64.b64decode(api_token).decode("utf-8")
u'random-string'

包含 non-ascii 个字符的示例

>>> base64.b64decode(base64.b64encode("random string后缀"))
'random string\xe5\x90\x8e\xe7\xbc\x80'
>>> base64.b64decode(base64.b64encode("random string后缀")).decode("utf-8")
u'random string\u540e\u7f00'
>>> sys.stdout.write(base64.b64decode(base64.b64encode("random string后缀")) + "\n")
random string后缀

注意，在Python 2.7中，bytes只是str的一个别名，特地加了一个unicode来支持unicode！

>>> bytes is str
True
>>> bytes is unicode
False
>>> str("foo")
'foo'
>>> unicode("foo")
u'foo'

为什么此生产代码有效：`base64.b64decode(api_token.encode(“utf-8)).decode(“utf-8”)`？

Why does this production code work: `base64.b64decode(api_token.encode(“utf-8)).decode(“utf-8”)`?

python

python-2.7

airflow