从包含转义序列的文本编码表情符号

Question

我正在尝试将此表单 text = "\ud83d\ude04\n\u3082\u3042" 中的一些带有表情符号的文本打印到：

# my expecting output
# a new line after the emoji, then is Japanese character
>>>
もあ

我看过一个关于这个的问题，但只是解决了部分问题：

我按照 post 中提到的代码进行操作，得到以下结果：

emoji_text = "\ud83d\ude04\n\u3082\u3042".encode("latin_1")
output = (emoji_text
  .decode("raw_unicode_escape")
  .encode('utf-16', 'surrogatepass')
  .decode('utf-16')
)
print(output)

>>>\nもあ
# it prints \n instead of a new line

所以想请教一下，在转换emoji和文字的时候，如何转换转义序列\n, \t, \b等？

Answer 1

使用 unicode_escape 而不是 raw_unicode_escape 也会解码 \n。尽管如果您首先使用 raw_unicode_escape 是有原因的，那么也许这不合适？

您选择编码为 "latin-1" 有点奇怪，但也许这也是有原因的。也许你应该编码成 "ascii" 并准备好应对任何可能的后果。

从包含转义序列的文本编码表情符号

Encoding Emojis from text which includes escape sequence

unicode

encode

utf-16

python-3.x

emoji