Python 3.5 base64解码好像不对?
Python 3.5 base64 decoding seems to be incorrect?
在 Python 3.5
中,base64
模块有一个方法,standard_b64decode()
用于从 base64 解码字符串,returns 一个 bytes
对象。
当我运行base64.standard_b64decode("wc==")
时,输出是b\xc1
。当你用 base64 编码 "\xc1"
时,你会得到 "wQ=="
。看起来解码函数有错误。实际上,我认为 "wc=="
是一个无效的 base64 编码字符串,根据这个推理:
wc==
以==
结尾,这意味着它是从单个输入字节产生的。
'w'
和'c'
在常规base64字母表中对应的值分别是48
和28
,表示它们的6-位表示分别是110000
和011100
。
将它们串联起来,前8位是11000001
,也就是\xc1
,但是剩下的位(1100
)都是非零的,所以不能它是由 base64 编码期间执行的填充过程产生的,因为它只附加具有值 0
的位,这意味着这些额外的 1
位不能通过有效的 base64 编码产生 ->字符串不是有效的 base64 编码字符串。
我认为当第二个字符的最后 4 位中的任何一个是 1
.
时,对于以 ==
结尾的任何 4 个字符的 base64 编码块都是如此
我非常确信这是正确的,但我的经验不如 Python 开发人员。
任何人都可以证实以上内容,或者解释为什么它是错误的,如果确实是这样的话?
Base64 标准定义为 RFC 4648. Your question is answered by §3.5:
Canonical Encoding
The padding step in base 64 and base 32 encoding can, if improperly implemented, lead to non-significant alterations of the encoded data. For example, if the input is only one octet for a base 64 encoding, then all six bits of the first symbol are used, but only the first two bits of the next symbol are used. These pad bits MUST be set to zero by conforming encoders, which is described in the descriptions on padding below. If this property do not hold, there is no canonical representation of base-encoded data, and multiple base- encoded strings can be decoded to the same binary data. If this property (and others discussed in this document) holds, a canonical encoding is guaranteed.
In some environments, the alteration is critical and therefore decoders MAY chose to reject an encoding if the pad bits have not been set to zero.
MAY的含义定义为RFC 2119:
MAY This word, or the adjective "OPTIONAL", mean that an item is truly optional. One vendor may choose to include the item because a particular marketplace requires it or because the vendor feels that it enhances the product while another vendor may omit the same item.
所以 Python 没有义务被标准拒绝非规范编码。
在 Python 3.5
中,base64
模块有一个方法,standard_b64decode()
用于从 base64 解码字符串,returns 一个 bytes
对象。
当我运行base64.standard_b64decode("wc==")
时,输出是b\xc1
。当你用 base64 编码 "\xc1"
时,你会得到 "wQ=="
。看起来解码函数有错误。实际上,我认为 "wc=="
是一个无效的 base64 编码字符串,根据这个推理:
wc==
以==
结尾,这意味着它是从单个输入字节产生的。'w'
和'c'
在常规base64字母表中对应的值分别是48
和28
,表示它们的6-位表示分别是110000
和011100
。将它们串联起来,前8位是
11000001
,也就是\xc1
,但是剩下的位(1100
)都是非零的,所以不能它是由 base64 编码期间执行的填充过程产生的,因为它只附加具有值0
的位,这意味着这些额外的1
位不能通过有效的 base64 编码产生 ->字符串不是有效的 base64 编码字符串。
我认为当第二个字符的最后 4 位中的任何一个是 1
.
==
结尾的任何 4 个字符的 base64 编码块都是如此
我非常确信这是正确的,但我的经验不如 Python 开发人员。
任何人都可以证实以上内容,或者解释为什么它是错误的,如果确实是这样的话?
Base64 标准定义为 RFC 4648. Your question is answered by §3.5:
Canonical Encoding
The padding step in base 64 and base 32 encoding can, if improperly implemented, lead to non-significant alterations of the encoded data. For example, if the input is only one octet for a base 64 encoding, then all six bits of the first symbol are used, but only the first two bits of the next symbol are used. These pad bits MUST be set to zero by conforming encoders, which is described in the descriptions on padding below. If this property do not hold, there is no canonical representation of base-encoded data, and multiple base- encoded strings can be decoded to the same binary data. If this property (and others discussed in this document) holds, a canonical encoding is guaranteed.
In some environments, the alteration is critical and therefore decoders MAY chose to reject an encoding if the pad bits have not been set to zero.
MAY的含义定义为RFC 2119:
MAY This word, or the adjective "OPTIONAL", mean that an item is truly optional. One vendor may choose to include the item because a particular marketplace requires it or because the vendor feels that it enhances the product while another vendor may omit the same item.
所以 Python 没有义务被标准拒绝非规范编码。