.html() 方法是否自动编码 HTML？

Question

上下文

我有一个网站，用户可以在其中撰写自己的文章。我使用 contenteditable div 允许用户标记他们的 HTML（粗体、斜体等），我想确保我防止 XSS 攻击。为此，我在将数据输出到页面时使用 htmlspecialchars()。我想单独留下用户的输入数据并在输出时对其进行编码。但是，我使用.html()方法获取contenteditablediv中的HTML内容，貌似用户插入的标签是通过.html()方法自动编码的，或者令人满意 div.

contenteditable div 的结构是

<div class="article" contenteditable="true"></div>

例如，如果用户尝试通过插入 <script>alert('hi')</script> 在 contenteditable div 上执行 XSS，如下所示：

<div class="article" contenteditable="true">
    <script>alert('hi')</script>
</div>

当我将数据提交到数据库时（使用 var article = $(".article").html();），我在数据库中看到的数据不是 <script>alert('hi')</script>，而是 <script>alert('hi')</script>，尽管事实上我没有自己编码任何东西。然后，我必须避免在输出文章内容时使用 htmlspecialchars()，以便 div 将字符串编码回其原始形式，而不是保持编码状态。

问题

.html() 方法是否自动编码 HTML，如果是，如何防止此功能（如果建议这样做）？

Answer 1

好的Carson D，下面的代码让我在旅途中受益匪浅

这是来源 link。如前所述，他们可能已将其更新为现代方法。如果您只需要一个独立的功能，那么从 link 中的代码中删除 module.exports = 位，如下所示：

https://locutus.io/php/strings/htmlspecialchars_decode/

link 可能会在 SO 的未来历史中变得陈旧，所以这里是代码形式的代码，所有学分都已到期。这是我至少从 2013/14 开始使用的代码，因此它可能确实与 link 中的代码不匹配，但我相信作者是相同的。

function htmlspecialchars_decode (string, quoteStyle) { // eslint-disable-line camelcase
  //       discuss at: https://locutus.io/php/htmlspecialchars_decode/
  //      original by: Mirek Slugen
  //      improved by: Kevin van Zonneveld (https://kvz.io)
  //      bugfixed by: Mateusz "loonquawl" Zalega
  //      bugfixed by: Onno Marsman (https://twitter.com/onnomarsman)
  //      bugfixed by: Brett Zamir (https://brett-zamir.me)
  //      bugfixed by: Brett Zamir (https://brett-zamir.me)
  //         input by: ReverseSyntax
  //         input by: Slawomir Kaniecki
  //         input by: Scott Cariss
  //         input by: Francois
  //         input by: Ratheous
  //         input by: Mailfaker (https://www.weedem.fr/)
  //       revised by: Kevin van Zonneveld (https://kvz.io)
  // reimplemented by: Brett Zamir (https://brett-zamir.me)
  //        example 1: htmlspecialchars_decode("<p>this -&gt; &quot;</p>", 'ENT_NOQUOTES')
  //        returns 1: '<p>this -> &quot;</p>'
  //        example 2: htmlspecialchars_decode("&amp;quot;")
  //        returns 2: '&quot;'

  var optTemp = 0
  var i = 0
  var noquotes = false

  if (typeof quoteStyle === 'undefined') {
    quoteStyle = 2
  }
  string = string.toString()
    .replace(/&lt;/g, '<')
    .replace(/&gt;/g, '>')
  var OPTS = {
    'ENT_NOQUOTES': 0,
    'ENT_HTML_QUOTE_SINGLE': 1,
    'ENT_HTML_QUOTE_DOUBLE': 2,
    'ENT_COMPAT': 2,
    'ENT_QUOTES': 3,
    'ENT_IGNORE': 4
  }
  if (quoteStyle === 0) {
    noquotes = true
  }
  if (typeof quoteStyle !== 'number') {
    // Allow for a single string or an array of string flags
    quoteStyle = [].concat(quoteStyle)
    for (i = 0; i < quoteStyle.length; i++) {
      // Resolve string input to bitwise e.g. 'PATHINFO_EXTENSION' becomes 4
      if (OPTS[quoteStyle[i]] === 0) {
        noquotes = true
      } else if (OPTS[quoteStyle[i]]) {
        optTemp = optTemp | OPTS[quoteStyle[i]]
      }
    }
    quoteStyle = optTemp
  }
  if (quoteStyle & OPTS.ENT_HTML_QUOTE_SINGLE) {
    // PHP doesn't currently escape if more than one 0, but it should:
    string = string.replace(/&#0*39;/g, "'")
    // This would also be useful here, but not a part of PHP:
    // string = string.replace(/&apos;|&#x0*27;/g, "'");
  }
  if (!noquotes) {
    string = string.replace(/&quot;/g, '"')
  }
  // Put this in last place to avoid escape being double-decoded
  string = string.replace(/&amp;/g, '&')

  return string
}

.html() 方法是否自动编码 HTML？

Does .html() method automatically encode HTML?

javascript

php

xss

jquery