具有 html 个实体的解码字符串不等于字符串文字

Question

我尝试实施 this thread 中提出的解决方案来解码具有 html 个实体的字符串，例如“foo bar”到“foo bar”。

从视觉上看，它似乎有效。但是我的快速 Jest 测试失败了：

Expected: "foobar"
Received: "foobar"

  3 | describe('encryption/decodeHtml', () => {
  4 |   it.each([['foo&nbsp;bar', 'foo bar'], ['foo&shy;bar', 'foobar'], ['foo&amp;bar', 'foo&bar']])('should decode html entities', (val, expected) => {
> 5 |     expect(decodeHtml(val)).toEqual(expected);
    |                             ^
  6 |   })
  7 | });
  8 |

快速 Object.is(decodeHtml(' '), ' ') 也会产生 false.

JS-Strings 有什么我不熟悉的地方吗？

Answer 1

正如评论中Andreas所指出的，我忘记了字符串的字节表示。

看这个例子：

toBytes('foo bar') -> Uint8Array(7) [102, 111, 111, 32, 98, 97, 114]
toBytes(decodeHtml('foo&nbsp;bar')) -> Uint8Array(8) [102, 111, 111, 194, 160, 98, 97, 114]

事后看来，这很明显，因为中断 space 和非中断 space 是（当然）不同的字符。

具有 html 个实体的解码字符串不等于字符串文字

Decoded string with html entities does not equal string literal

html

javascript

escaping

jestjs