TextEncoder 生成 UTF-8 而不是请求字符集编码

TextEncoder produces UTF-8 instead of request charset encoding

作为将我的 Thunderbird 扩展过渡到 Thunderbird 60 的一部分,我需要从使用 nsIScriptableUnicodeConverter(如果你不了解 Mozilla,请不要介意它是什么)切换到更流行的、支持多浏览器的,文本解码器和文本编码器。问题是,他们的行为不是我所期望的。

具体来说,假设我的字符串 str 包含“ùìåí”(当然没有引号)。现在,当我 运行:

undecoded_str = new TextEncoder("windows-1252").encode(str);

我希望得到序列

F9, EC, E5, ED, 2C

(5 个字符中每个字符的 1 个八位字节 windows-1252 值)。但我实际得到的是:

C3, B9, C3, AC, C3, A5, C3, AD, 2C

这似乎是字符串的UTF-8编码。为什么会这样?

令人恼火的是,许多浏览器在 TextEncoder(和 TextDecoder)中有多个字符集编码 simply dropped support

Note: Firefox, Chrome and Opera used to have support for encoding types other than utf-8 (such as utf-16, iso-8859-2, koi8, cp1261, and gbk). As of Firefox 48 (ticket), Chrome 54 (ticket) and Opera 41, no other encoding types are available other than utf-8, in order to match the spec. In all cases, passing in an encoding type to the constructor will be ignored and a utf-8 TextEncoder will be created (the TextDecoder still allows for other decoding types).

妈的!