如何转换字节(UTF-8)嵌入字符串中的表情符号

How to convert Bytes (UTF-8) embeded emoji in a string

我正在从 WhatsApp 聊天备份中抓取数据 (chat.txt)。看起来像这样:

7/21/20, 1:31 PM - mark: Can we look google  
7/21/20, 1:31 PM - elon: No  
7/21/20, 1:31 PM - mark: Can we smile ?  
7/21/20, 1:31 PM - elon: Ya

虽然我使用的是逐行提取

with open ('chat.txt','rb') as file:
    for line in file:
        print(str(line.strip()))

我知道了:

b'7/21/20, 7:37 AM - mark: Can we look google\xf0\x9f\xa4\xa9\xf0\x9f\x98\x82\xf0\x9f\x98\x82'
b'7/21/20, 7:37 AM - elon: No'
b'7/21/20, 1:31 PM - mark: Can we smile ?'
b'7/21/20, 7:37 AM - elon: Ya\xf0\x9f\x98\x82'
  1. 我们如何git摆脱b''? (我试过.decode('utf-8'),但没用)

  2. 如何转换

    Can we look google\xf0\x9f\xa4\xa9\xf0\x9f\x98\x82\xf0\x9f\x98\x82
    

    Can we look google?
    

使用正确的编码打开文件,而不是二进制模式:

with open ('chat.txt', encoding='utf8') as file:
    for line in file:
        print(line, end='')

其效果如何取决于您的执行环境。您需要 terminal/IDE 和支持 打印 代码点的 print 才能成功,但这不是 Python 问题。