TextEncoder 生成 UTF-8 而不是请求字符集编码
TextEncoder produces UTF-8 instead of request charset encoding
作为将我的 Thunderbird 扩展过渡到 Thunderbird 60 的一部分,我需要从使用 nsIScriptableUnicodeConverter(如果你不了解 Mozilla,请不要介意它是什么)切换到更流行的、支持多浏览器的,文本解码器和文本编码器。问题是,他们的行为不是我所期望的。
具体来说,假设我的字符串 str
包含“ùìåí”(当然没有引号)。现在,当我 运行:
undecoded_str = new TextEncoder("windows-1252").encode(str);
我希望得到序列
F9, EC, E5, ED, 2C
(5 个字符中每个字符的 1 个八位字节 windows-1252 值)。但我实际得到的是:
C3, B9, C3, AC, C3, A5, C3, AD, 2C
这似乎是字符串的UTF-8编码。为什么会这样?
令人恼火的是,许多浏览器在 TextEncoder
(和 TextDecoder
)中有多个字符集编码 simply dropped support:
Note: Firefox, Chrome and Opera used to have support for encoding types other than utf-8 (such as utf-16, iso-8859-2, koi8, cp1261, and gbk). As of Firefox 48 (ticket), Chrome 54 (ticket) and Opera 41, no other encoding types are available other than utf-8, in order to match the spec. In all cases, passing in an encoding type to the constructor will be ignored and a utf-8 TextEncoder
will be created (the TextDecoder
still allows for other decoding types).
妈的!
作为将我的 Thunderbird 扩展过渡到 Thunderbird 60 的一部分,我需要从使用 nsIScriptableUnicodeConverter(如果你不了解 Mozilla,请不要介意它是什么)切换到更流行的、支持多浏览器的,文本解码器和文本编码器。问题是,他们的行为不是我所期望的。
具体来说,假设我的字符串 str
包含“ùìåí”(当然没有引号)。现在,当我 运行:
undecoded_str = new TextEncoder("windows-1252").encode(str);
我希望得到序列
F9, EC, E5, ED, 2C
(5 个字符中每个字符的 1 个八位字节 windows-1252 值)。但我实际得到的是:
C3, B9, C3, AC, C3, A5, C3, AD, 2C
这似乎是字符串的UTF-8编码。为什么会这样?
令人恼火的是,许多浏览器在 TextEncoder
(和 TextDecoder
)中有多个字符集编码 simply dropped support:
Note: Firefox, Chrome and Opera used to have support for encoding types other than utf-8 (such as utf-16, iso-8859-2, koi8, cp1261, and gbk). As of Firefox 48 (ticket), Chrome 54 (ticket) and Opera 41, no other encoding types are available other than utf-8, in order to match the spec. In all cases, passing in an encoding type to the constructor will be ignored and a utf-8
TextEncoder
will be created (theTextDecoder
still allows for other decoding types).
妈的!