\uD83D\uDCCC 不断出现在我继承的代码中。这个 unicode 序列有什么作用?

\uD83D\uDCCC keep showing up in code I've inherited. What does this unicode sequence do?

我一直在阅读有关 code injection using unicode sequences and have been using a tool from Dotnetsafer 在我继承的代码坏中定位序列的信息。这个序列 \uD83D\uDCCC 不断出现:

一个例子:

appears as: [588]                             __builder5.AddMarkupContent(51, "??");
actual    : [588]                             __builder5.AddMarkupContent(51, "\uD83D\uDCCC");

这是什么序列?为什么代码会将其注入 HTML?

编辑 1:我查看了序列,我发现唯一有用的东西是 https://unicode.scarfboy.com/?s=D83D+DCCC

这些是编码 the Unicode character U+1F4CC(图钉表情符号)的 UTF-16 代码单元。

你怎么会发现的?

  1. 查找U+D83D and U+DCCC and find out that they are not actual Unicode characters, but high and low surrogates respectively, meaning they are used in UTF-16
  2. Google for "D83D DCCC" and find this page 其中明确列出了图钉表情符号的 UTF-16 编码。

其实,想想看,你可以跳过第 1 步 ;-)