HTML5 是否允许使用 UTF-16 编码外部脚本?

Does HTML5 permit encoding external scripts in UTF-16?

HTML 标准要求¹对 HTML 文档使用 UTF-8 编码。

是否允许对外部加载的脚本使用其他编码?

<script src="/script1.js">
<script type="module" src="/script2.mjs">

这些脚本将以 UTF-16 而不是 UTF-8 编码,并且将由具有 header Content-Type: text/javascript; charset=UTF-16 的 Web 服务器提供。此设置是否符合 HTML 规范?


  1. meta 元素的 charset 属性指定文档使用的字符编码。这是一个字符编码声明。如果该属性存在,其值必须是 ASCII case-insensitive 匹配字符串“utf-8”” (§ 4.2.5). “Regardless of whether a character encoding declaration is present or not, the actual character encoding used to encode the document must be UTF-8” (§ 4.2.5.4).

The HTML standard requires the use of the UTF-8 encoding for HTML documents

不,不是。它 更喜欢 UTF-8,但您可以使用您想要的任何其他字符集,只要您在适当的 <meta> 元素中明确声明它。参见 Declaring character encodings in HTML

Does it permit the use of other encodings for externally loaded scripts?

<script> 元素有一个 charset attribute,虽然这已被弃用,取而代之的是 Content-Type HTTP header 的 charset 属性,当脚本被检索。如果 charset 出现在 <script> 中,它必须匹配 Content-Typecharset。如果未指定 charset,则假定 HTML 的字符集。

JavaScript 模块看起来像 HTML5 permits different encodings for regular scripts and mandates UTF-8。

To fetch a classic script given a url, a settings object, some options, a CORS setting, and a character encoding, run these steps. The algorithm will asynchronously complete with either null (on failure) or a new classic script (on success).

[...]

  1. If response's Content Type metadata, if any, specifies a character encoding, and the user agent supports that encoding, then set character encoding to that encoding (ignoring the passed-in value).

  2. Let source text be the result of decoding response's body to Unicode, using character encoding as the fallback encoding.

 [...]

To fetch a single module script, given a url, a fetch client settings object, a destination, some options, a module map settings object, a referrer, and a top-level module fetch flag, run these steps. The algorithm will asynchronously complete with either null (on failure) or a module script (on success).

[...]

  1. Let source text be the result of UTF-8 decoding response's body.

[...]