在 HTML Blob 中保留元标记的字符集属性?

Preserve charset attribute of meta tag in HTML Blob?

我正在生成这样的 client-side HTML redirect

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Déjà vu - Wikipedia</title>
  <script type='text/javascript'>
  document.addEventListener('DOMContentLoaded', function () {
var newHTML = document.createElement('html');
var newHead = document.createElement('head');
var newMeta = document.createElement('meta');
var newTitle = document.createElement('title');
newTitle.text = "Déjà vu - Wikipedia";
newMeta.httpEquiv = "refresh";
newMeta.charset = "utf-8";
newMeta.content = "30;url=https://en.wikipedia.org/wiki/D%C3%A9j%C3%A0_vu";
var newBody = document.createElement('body');
var newPar = document.createElement('p');
var newText = document.createTextNode('Loading Déjà vu - Wikipedia...');
newPar.appendChild(newText);
newBody.appendChild(newPar);
newHead.appendChild(newMeta);
newHead.appendChild(newTitle);
newHTML.append(newHead);
newHTML.append(newBody);
var tempAnchor = window.document.createElement('a');
HTMLBlob = new Blob([newHTML.outerHTML], {type: 'text/html; charset=UTF-8'});
tempAnchor.href = window.URL.createObjectURL(HTMLBlob);
tempAnchor.download = "example-redirect.html"
tempAnchor.style.display = 'none';
document.body.appendChild(tempAnchor);
tempAnchor.click();
document.body.removeChild(tempAnchor);

  });
  </script>
  </head>
  <body>
  </body>
</html>

但是,当我这样做时,我失去了 charset 元属性。输出如下所示:

<html><head><meta http-equiv="refresh" content="30;url=https://en.wikipedia.org/wiki/D%C3%A9j%C3%A0_vu"><title>Déjà vu - Wikipedia</title></head><body><p>Loading Déjà vu - Wikipedia...</p></body></html>

这意味着我的浏览器不确定使用什么编码,并且无法正确显示重音符号。

另一方面,这正确显示了重音符号:

<html><head><meta http-equiv="refresh" charset="utf-8" content="30;url=https://en.wikipedia.org/wiki/D%C3%A9j%C3%A0_vu"><title>Déjà vu - Wikipedia</title></head><body><p>Loading Déjà vu - Wikipedia...</p></body></html>

我已尽可能减少它的数量,但它仍然存在。

<!DOCTYPE html>
<html lang="en">

<head>
  <meta charset="utf-8">
  <title>title</title>
  <script type='text/javascript'>
    document.addEventListener('DOMContentLoaded', function() {
      var newHTML = document.createElement('html');
      var newHead = document.createElement('head');
      var newMeta = document.createElement('meta');
      newMeta.charset = "utf-8";
      newHead.appendChild(newMeta);
      newHTML.append(newHead);
      var tempAnchor = window.document.createElement('a');
      HTMLBlob = new Blob([newHTML.outerHTML], {
        type: 'text/html; charset=UTF-8'
      });
      tempAnchor.href = window.URL.createObjectURL(HTMLBlob);
      tempAnchor.download = "minimal-output.html"
      tempAnchor.style.display = 'none';
      document.body.appendChild(tempAnchor);
      tempAnchor.click();
      document.body.removeChild(tempAnchor);

    });
  </script>
</head>

<body>
</body>

</html>

这是输出:

<html><head><meta></head></html>

这在 Firefox 63.0 和 Chromium 70.0 中都会发生。这是 Git 回购的 link:

https://github.com/nbeaver/Whosebug_question_2018-11-07

如何保留 HTML blob 的 charset 属性?

根据这个回答Set charset meta tag with JavaScript

You can't set the charset content attribute by setting the charset property because they don't reflect each other. In fact there is no property that reflects the charset content attribute. [...] The character set is established by the parser, so constructing the meta element in JavaScript after the HTML has been parsed will have no effect on the character set of the document at all.

但是,在您的情况下,在 blob 前面加上 UTF-8 BOM header 可能会成功。

HTMLBlob = new Blob(["\ufeff",newHTML.outerHTML], {type: 'text/html; charset=UTF-8'});

HTML <meta> 元素当前没有用于设置字符集属性的专用 DOM 界面。 请参阅规范:https://www.w3.org/TR/html5/document-metadata.html#the-meta-element.

newMeta.charset = "utf-8"; 仅将您自己的任意 charset 属性 添加到 newMeta JavaScript 对象。这个任意 属性 对 <meta> 元素的 charset HTML 属性没有影响。

您需要像这样设置字符集属性:newMeta.setAttribute("charset", "utf-8");